The aim of this thesis has been the development of a reliable tool to help radiologists to detect breast cancer in mammographic images. We began studying and analyzing the proposals found in literature. From this study, we concluded that none of the proposals had optimal performance for all cases. Moreover, we showed that the shape and size of the masses and the breast density are parameters which affect the performance of those methods.
From these conclusions, we developed a new algorithm which takes these three parameters into account. The shape and size of the masses are learned creating a set of templates which subsequently are matched in the mammogram. This matching step is performed using a Bayesian approach. Moreover, as this step provides a large number of false positives (suspicious regions being normal tissue) we developed a novel false positive reduction algorithm based on the use of the recently developed 2DPCA approach.
To integrate breast density information into the algorithm we first studied the existing proposals. We found that there were not many approaches which classify the breast according to BIRADS categories, which is the standard currently used by radiologists to classify breasts internal tissue. Thus, we developed a new method based on grouping the pixels according to their appearance (fatty or dense). Afterwards, texture features were extracted from each cluster and used to classify the breast into the BIRADS categories.
Once the mammogram is classified according to its density class, the proposed mass detection algorithm is used to detect the masses, but now only trained using RoIs belonging to the same density class. Results obtained using the DDSM database to train the system, and the MIAS and the Trueta databases for testing, demonstrate the feasibility of our proposal.
Moreover, the fact that we train the system using a different database shows the robustness of the approach. We also conclude that the false positive reduction approach is not as robust as the template matching algorithm, because the performance of the 2DPCA approach looses effectiveness when training and testing with different databases. In contrast, the proposed template matching does not highly depend on this aspect.
As a final overview of the developed work,
Table shows a comparison of the
performance of our approach based on the set of
mammograms
extracted from the MIAS database,
with confirmed masses and
being normal mammograms. In the table, the training database
is specified (MIAS or DDSM) as well as the used approach: the
Bayesian pattern matching
,
with false positive
reduction
, and including the breast density information
. Mean
and the number of false positive per image at a
given sensitivity (
) are included in the table.
The best results in both
and false positives per image is
obtained by using the same database for training and testing the
system. In concrete, the best accuracy is obtained without using
the false positive reduction step, while the smallest false
positive number is obtained when including it. Training using a
different database implies to both reduce the mean
and
increase the number of false positives. This is due to the
different nature of the databases, for example that the training
database has more smaller masses than the testing one, and this
implies a high number of false positives at smaller sizes. On the
other hand, the introduction of the breast density information
results in an increase of the accuracy in which masses are
detected as well as a decrease in the number of false positives
per image.