The aim of this thesis has been the development of a reliable tool to help radiologists to detect breast cancer in mammographic images. We began studying and analyzing the proposals found in literature. From this study, we concluded that none of the proposals had optimal performance for all cases. Moreover, we showed that the shape and size of the masses and the breast density are parameters which affect the performance of those methods.
From these conclusions, we developed a new algorithm which takes these three parameters into account. The shape and size of the masses are learned creating a set of templates which subsequently are matched in the mammogram. This matching step is performed using a Bayesian approach. Moreover, as this step provides a large number of false positives (suspicious regions being normal tissue) we developed a novel false positive reduction algorithm based on the use of the recently developed 2DPCA approach.
To integrate breast density information into the algorithm we first studied the existing proposals. We found that there were not many approaches which classify the breast according to BIRADS categories, which is the standard currently used by radiologists to classify breasts internal tissue. Thus, we developed a new method based on grouping the pixels according to their appearance (fatty or dense). Afterwards, texture features were extracted from each cluster and used to classify the breast into the BIRADS categories.
Once the mammogram is classified according to its density class, the proposed mass detection algorithm is used to detect the masses, but now only trained using RoIs belonging to the same density class. Results obtained using the DDSM database to train the system, and the MIAS and the Trueta databases for testing, demonstrate the feasibility of our proposal.
Moreover, the fact that we train the system using a different database shows the robustness of the approach. We also conclude that the false positive reduction approach is not as robust as the template matching algorithm, because the performance of the 2DPCA approach looses effectiveness when training and testing with different databases. In contrast, the proposed template matching does not highly depend on this aspect.
As a final overview of the developed work, Table shows a comparison of the performance of our approach based on the set of mammograms extracted from the MIAS database, with confirmed masses and being normal mammograms. In the table, the training database is specified (MIAS or DDSM) as well as the used approach: the Bayesian pattern matching , with false positive reduction , and including the breast density information . Mean and the number of false positive per image at a given sensitivity ( ) are included in the table.
The best results in both and false positives per image is obtained by using the same database for training and testing the system. In concrete, the best accuracy is obtained without using the false positive reduction step, while the smallest false positive number is obtained when including it. Training using a different database implies to both reduce the mean and increase the number of false positives. This is due to the different nature of the databases, for example that the training database has more smaller masses than the testing one, and this implies a high number of false positives at smaller sizes. On the other hand, the introduction of the breast density information results in an increase of the accuracy in which masses are detected as well as a decrease in the number of false positives per image.