The performance of our approach is evaluated using a total of mammograms extracted from the MIAS mammographic database [169]. Among them, show confirmed masses (the ground-truth provided by an expert) while the rest are normal mammograms. It is important to note that the MIAS database has been used for evaluation as we have accurate manual annotations for the lesions. However, the number of cases found in the MIAS is small for building the statistical models for detection and false positive reduction steps using breast density information. The DDSM database [68], on the other hand, presents less accurate annotations but a larger number of cases. For this reason, the DDSM database has been used for statistical training (remember that the algorithm does not need an accurate set of manual annotations) and the MIAS for testing the mass detection accuracy. Hence, two databases of RoIs have been extracted from the DDSM database containing both masses and non-masses. We are clustering the DDSM database in BIRADS classes, and each class is clustered according to the size. Again, we used six different sizes according to the lesion size: , and the number of masses in each interval was respectively, , , , , , and . Moreover, for the false positive reduction learning step, normal RoIs for each mass RoI were included in each size-cluster.
The evaluation is again done using Free Receiver Operating Characteristics (FROC) and Receiver Operating Characteristics (ROC) analysis. Remember that FROC analysis quantifies the performance of the algorithms to distinguish between mammograms with and without masses, while a ROC curve indicates the accuracy in which the masses are detected.
In Figure the performance of the presented algorithms is evaluated. The grey line with pentagrams shows the proposed template matching performance, obtaining a high number of false positives per image (regions marked as suspicious but being normal tissue). This number is clearly reduced by the false positive reduction algorithm, the black line with pentagrams. The lines with hexagrams are obtained when including the breast density information. Note that including the breast density information the performance for both approaches is improved. For instance, at a sensitivity of the performance for the algorithm without false positive reduction increases from false positive per image to , while when including the false positive reduction step goes from to .
|
A comparison between our approach and the algorithms d1 and d2 is also provided in Figure . Our approach (the black line with hexagrams) outperforms both algorithms, obtaining an intermediate performance between the results shown in Figure and in Figure . One should recall that those figures were related to results using the same database for both training and testing (Figure ) and to results using different databases (Figure ). For instance, at a sensitivity of now the mean number of false positive per image is , which is an intermediate value between when training and testing using the same database and when using different databases.
|
Once the mammograms containing masses are detected, a ROC curve and the corresponding analysis is performed. The overall performance over the mammograms containing masses resulted in a value of and without and with considering breast density information, respectively. Thus, introducing this information has two effects: firstly, mean is increased, and secondly, the deviation is reduced, showing that this information is also beneficial in those cases where the algorithm has a lower accuracy. Comparing with algorithms d1 ( ) and d2 ( ) the proposal is clearly better than d1 and is similar to d2, despite the drawback of being trained and tested using different databases.
Table shows the effect of the lesion size for the different algorithms in terms of mean and standard deviation of the values. Our proposal has similar performances for each size except for the range , which obtains the best results. The inclusion of the breast density outperforms the results in all classes except for the group. This can be explained by the fact that the system has more information about the shapes of larger masses when there is more instances in the training database. Note also that the problem of the smallest size described in Section is now partially solved, obtaining also better results than algorithm d2.
We include in Table a comparison of the performance of the algorithm according to the breast density. We detail in the table both BIRADS categories (using the consensus opinion of three different radiologists) and the fatty/glandular/dense annotations found in the MIAS database. Note that, independently of the classification criteria used, the performance of the algorithm is mainly independent of this factor. For instance, for BIRADS categories all classes have similar behaviour except BIRADS II where the algorithm performs slightly better. For the three-class annotations the dense class performs slightly worse than the other two. This lower performance in dense class is not that clear using BIRADS categories because the mammograms where the algorithm performs slightly worse are distributed between both BIRADS III and IV.
|
In Chapter we also concluded that not only the lesion size and the breast tissue but also the shape of the mass affect the performance of the algorithms. In that sense, Table shows the performance of the algorithm according to the mass shape: circular or spiculated. Note that the algorithm performs slightly better for circular masses that for spiculated ones, although this difference is not significant.