next up previous contents
Next: The Importance of the Up: Results Previous: Results Based on Consensus   Contents


DDSM Database

The developed methodology was also evaluated on a set of $ 831$ mammograms taken from the Digital Database of Screening Mammographies (DDSM)[67], with the main objective to demonstrate the robustness of our proposal on a different and larger data set. Similarly to the MIAS database, DDSM provides for each mammogram additional information including the density of the breast. In contrast to MIAS, this information is already determined using the BIRADS categories.

The number of mammograms belonging to each category is: $ 106
(13\%)$ , $ 336 (40\%)$ , $ 255 (31\%)$ , and $ 134 (16\%)$ for BIRADS I to IV, respectively. These proportion are consistent with the numbers reported by ongoing screening programs. As shown in the work of Lehman et al. [102], where a population of $ 46,340$ women was studied, $ 13.6\%$ were BIRADS I, $ 50.9\%$ BIRADS II, $ 30.1\%$ BIRADS III, and $ 5.5\%$ BIRADS IV. Although these percentages vary with the age of the women, classes II and III tend to be larger than classes I and IV [33,62,186].

The DDSM database provides four mammograms (MLO left and right, CC left and right) for most women. To avoid bias we selected only the right MLO mammogram for each woman. This way, the leave-one-woman-out used for evaluating the system in the previous sections is now reduced to the typical leave-one-image-out evaluation methodology.

Using this evaluation strategy, Table [*] shows the results obtained with the classifiers. These results show a slightly reduced performance when compared to the MIAS database based results (see Tables [*] and [*]). To be specific, the performance obtained by the classifiers is $ 70\%$ , $ 72\%$ , and $ 77\%$ for kNN, C$ 4.5$ , and Bayesian combination, respectively. Note that using this database, the performance using C$ 4.5$ is better than using kNN. This can be due to the use of more mammograms and a different distribution over the BIRADS classes in the training set. The $ \kappa$ value, equal to $ 0.67$ , indicates a Substantial correlation between the manual and the automatic Bayesian classification.


Table 3.6: Confusion matrices for DDSM classification according to BIRADS categories. The results are based on a leave-one-image-out methodology with $ 831$ mammograms. (a) kNN classifier, (b) C$ 4.5$ decision tree, and (c) Bayesian classifier.
               kNN ( $ 70\%,\kappa = 0.56$ ) C4.5 ( $ 72\%,\kappa = 0.59$ ) Bayesian ( $ 77\%,\kappa = 0.67$ )
  B-I B-II B-III B-IV
B-I $ 54$ $ 40$ $ 12$ 0
B-II $ 44$ $ 266$ $ 25$
B-III $ 9$ $ 60$ $ 177$
B-IV 0 $ 21$ $ 30$

B-I B-II B-III B-IV
$ 51$ $ 30$ $ 25$ 0
$ 22$ $ 279$ $ 35$ 0
$ 16$ $ 59$ $ 178$ $ 2$
$ 8$ $ 14$ $ 25$ $ 87$

B-I B-II B-III B-IV
$ 58$ $ 25$ $ 23$ 0
$ 15$ $ 295$ $ 26$ 0
$ 12$ $ 46$ $ 196$ $ 1$
$ 5$ $ 18$ $ 18$ $ 93$
(a) (b) (c)


Examining each class alone, BIRADS I reached $ 55\%$ correct classification, BIRADS II $ 88\%$ , BIRADS III $ 77\%$ , and BIRADS IV $ 69\%$ . In contrast to the MIAS database, here BIRADS I shows the worst results, whilst BIRADS II shows the best. We believe that this result is due to the fact that in the DDSM database, mammograms belonging to BIRADS I have tissue very similar with those belonging to BIRADS II. Related to the classification of dense mammograms, the ones belonging to BIRADS III are better classified that the ones belonging to BIRADS IV. Moreover, only one mammogram not belonging to BIRADS IV is misclassified as this class.

Using the low/high density division, low density mammograms are $ 89\%$ correctly classified, while high density ones reach a $ 79\%$ . It should be clear that compared to the MIAS consensus results, the performance is mainly reduced on the high density mammograms that has decreased, whilst a similar classification for the low density mammograms is obtained.


next up previous contents
Next: The Importance of the Up: Results Previous: Results Based on Consensus   Contents
Arnau Oliver 2008-06-17