The developed methodology was also evaluated on a set of
mammograms taken from the Digital Database of Screening
Mammographies (DDSM)[67], with the main objective to
demonstrate the robustness of our proposal on a different and
larger data set. Similarly to the MIAS database, DDSM provides for
each mammogram additional information including the density of the
breast. In contrast to MIAS, this information is already
determined using the BIRADS categories.
The number of mammograms belonging to each category is:
,
,
, and
for BIRADS I
to IV, respectively. These proportion are consistent with the
numbers reported by ongoing screening programs. As shown in the
work of Lehman et al. [102], where a population of
women was studied,
were BIRADS I,
BIRADS II,
BIRADS III, and
BIRADS IV. Although
these percentages vary with the age of the women, classes II and
III tend to be larger than classes I and
IV [33,62,186].
The DDSM database provides four mammograms (MLO left and right, CC left and right) for most women. To avoid bias we selected only the right MLO mammogram for each woman. This way, the leave-one-woman-out used for evaluating the system in the previous sections is now reduced to the typical leave-one-image-out evaluation methodology.
Using this evaluation strategy, Table
shows the results obtained with the classifiers. These results
show a slightly reduced performance when compared to the MIAS
database based results (see
Tables
and
). To be specific, the performance
obtained by the classifiers is
,
, and
for kNN,
C
, and Bayesian combination, respectively. Note that using
this database, the performance using C
is better than using
kNN. This can be due to the use of more mammograms and a different
distribution over the BIRADS classes in the training set. The
value, equal to
, indicates a Substantial
correlation between the manual and the automatic Bayesian
classification.
Examining each class alone, BIRADS I reached
correct
classification, BIRADS II
, BIRADS III
, and BIRADS IV
. In contrast to the MIAS database, here BIRADS I shows the
worst results, whilst BIRADS II shows the best. We believe that
this result is due to the fact that in the DDSM database,
mammograms belonging to BIRADS I have tissue very similar with
those belonging to BIRADS II. Related to the classification of
dense mammograms, the ones belonging to BIRADS III are better
classified that the ones belonging to BIRADS IV. Moreover, only
one mammogram not belonging to BIRADS IV is misclassified as this
class.
Using the low/high density division, low density mammograms are
correctly classified, while high density ones reach a
. It should be clear that compared to the MIAS consensus
results, the performance is mainly reduced on the high density
mammograms that has decreased, whilst a similar classification for
the low density mammograms is obtained.