In this section, we test our approach using the same training set but using a test set from the Trueta digital database. This is composed by a set of MLO and CC views mammograms containing, at least, one mass.
The evaluation is done using ROC analysis, and the set of DDSM RoIs depicting masses for the templates construction and the rest of RoIs for the false positive reduction model. In order to calculate how the breast density misclassification affects the performance of the system we will repeat our experiment twice: firstly, considering the breast density as annotated in the database, and secondly, classifying the breasts using the algorithm proposed in Chapter .
Table shows the confusion matrices for both classifications and MLO and CC views. The algorithm clearly obtained better performance for MLO mammograms than for CC ones. The kappa value for the former is , which according to Table is in the high part of the substantial agreement. In contrast, the kappa value for CC views is which is on the border between moderate and substantial. Looking at class level, note that mammograms belonging to BIRADS I are almost all classified correctly for MLO mammograms, while for CC views the performance is reduced. Moreover, the two mammograms belonging to BIRADS IV are misclassified in both confusion matrices.
On the other hand, Table shows the obtained results when training the proposed segmentation algorithms using the RoIs clustered according to both annotations: the manual and the automatic. Note that, in general, both results are less satisfactory compared with the ones obtained using the MIAS database (see Table ). The main reason for this is due to the false positive reduction algorithm, which is still trained using digitized RoIs in contrast to using digital ones. It is the same effect we noticed when comparing the results obtained with the MIAS database but even more pronounced.
Comparing the results according to the annotations origin, note that the results obtained using the automatic estimation outperforms in almost higher than the ones obtained using the manual annotations. This shows that the automatic method is able to capture the mammogram appearance with more objectivity than a human expert, although the mammogram will probably be misclassified according to the experts opinion.