We analyze in this section the performance of the algorithm when using different databases for learning the size and shape of the masses and for detecting them in the images. For such task, we used the MIAS database to test the system and the DDSM [68] one for training. Due to the large mass variability of DDSM database we used six different sizes to train the system: , and the number of masses in each interval was respectively, , , , , , and masses. Moreover, for the false positive reduction step, normal RoIs for each mass RoI were included in each size-cluster.
|
Figure shows the performance of the algorithm . The grey lines show the performance of the proposal without the false positive reduction step, while the black ones including it. The lines with squares are obtained when training and testing with the same database, while the lines with pentagrams when training and testing using different databases. We can see that the Bayesian pattern matching has more false positives per image when is trained with different database. This is basically due to the fact that we are now training with more sizes and, further, smaller patterns. Thus, there is a large set of small regions being normal tissue but detected as suspicious by the algorithm. However, the false positive reduction step allows to greatly reduce this number, although the performance is slightly worst compared to training and testing with the same database. For instance, at a sensitivity of the number of false positives per image when training and testing using different databases was without the false positive reduction algorithm and when including it, while when training and testing using the same database the false positives were and , respectively.
Figure shows the comparison of the algorithm trained with DDSM when testing the set of mammograms from MIAS database and algorithms d1 and d2. Note that the performance of the proposal Eig is similar to algorithm d2 at sensitivities around . In contrast, is clearly better at higher sensitivities and worst at intermediate sensitivities. Note that when including the false positive reduction step the performance is clearly better.
|
On the other hand, using ROC analysis for the set of mammograms containing masses, we found that mean without false positive reduction was , while including it was . This results are slightly worse compared to the algorithm trained and tested using the same database ( and , respectively) and also compared to algorithm d2 ( ). However, note that this algorithm is still trained and tested using the same database. On the other hand, both proposals outperforms algorithm d1 ( ).
Table shows the mean values detailed per mass size when the training and testing was done using the same database or using different databases. Note that the main performance drop is for the smaller masses, where the mean is reduced around units. This is basically due to the number of false positives detected by the template matching algorithm at small template sizes. The false positive reduction step allows to decrease the number of false positives, although in the cases where this algorithm increases the number of false negatives (classifying a true mass as normal tissue) the mean of the system is drastically reduced. For the other sizes, the performance is similar when the training and testing was done using the same database or using different databases, and also in some cases, the performance is better when using different databases.