The performance of our approach is evaluated using a total of
mammograms extracted from the MIAS mammographic
database [169]. Among them,
show confirmed masses
(the ground-truth provided by an expert) while the rest are normal
mammograms. It is important to note that the MIAS database has
been used for evaluation as we have accurate manual annotations
for the lesions. However, the number of cases found in the MIAS is
small for building the statistical models for detection and false
positive reduction steps using breast density information. The
DDSM database [68], on the other hand, presents less
accurate annotations but a larger number of cases. For this
reason, the DDSM database has been used for statistical training
(remember that the algorithm does not need an accurate set of
manual annotations) and the MIAS for testing the mass detection
accuracy. Hence, two databases of RoIs have been extracted from
the DDSM database containing both masses and non-masses. We are
clustering the DDSM database in
BIRADS classes, and each class
is clustered according to the size. Again, we used six different
sizes according to the lesion size:
, and the number of masses in each interval was
respectively,
,
,
,
,
, and
. Moreover,
for the false positive reduction learning step,
normal RoIs
for each mass RoI were included in each size-cluster.
The evaluation is again done using Free Receiver Operating Characteristics (FROC) and Receiver Operating Characteristics (ROC) analysis. Remember that FROC analysis quantifies the performance of the algorithms to distinguish between mammograms with and without masses, while a ROC curve indicates the accuracy in which the masses are detected.
In Figure the performance of the presented
algorithms is evaluated. The grey line with pentagrams shows the
proposed template matching performance, obtaining a high number of
false positives per image (regions marked as suspicious but being
normal tissue). This number is clearly reduced by the false
positive reduction algorithm, the black line with pentagrams. The
lines with hexagrams are obtained when including the breast
density information. Note that including the breast density
information the performance for both approaches is improved. For
instance, at a sensitivity of
the performance for the
algorithm without false positive reduction increases from
false positive per image to
, while when including the false
positive reduction step goes from
to
.
![]() |
A comparison between our approach and the algorithms d1 and
d2 is also provided in
Figure . Our approach (the black
line with hexagrams) outperforms both algorithms, obtaining an
intermediate performance between the results shown in
Figure
and in
Figure
. One should recall
that those figures were related to results using the same database
for both training and testing
(Figure
) and to results using
different databases
(Figure
). For instance, at
a sensitivity of
now the mean number of false positive per
image is
, which is an intermediate value between
when training and testing using the same database and
when
using different databases.
![]() |
Once the mammograms containing masses are detected, a ROC curve
and the corresponding
analysis is performed. The overall
performance over the
mammograms containing masses resulted in
a
value of
and
without and
with considering breast density information, respectively. Thus,
introducing this information has two effects: firstly,
mean
is increased, and secondly, the deviation is reduced, showing that
this information is also beneficial in those cases where the
algorithm has a lower accuracy. Comparing with algorithms d1
(
) and d2 (
) the
proposal is clearly better than d1 and is similar to
d2, despite the drawback of being trained and tested using
different databases.
Table shows the effect of the lesion
size for the different algorithms in terms of mean and standard
deviation of the
values. Our proposal has similar
performances for each size except for the range
,
which obtains the best results. The inclusion of the breast
density outperforms the results in all classes except for the
group. This can be explained by the fact that the system
has more information about the shapes of larger masses when there
is more instances in the training database. Note also that the
problem of the smallest size described in
Section
is now partially solved,
obtaining also better results than algorithm d2.
We include in Table a comparison of the
performance of the algorithm according to the breast density. We
detail in the table both BIRADS categories (using the consensus
opinion of three different radiologists) and the
fatty/glandular/dense annotations found in the MIAS database. Note
that, independently of the classification criteria used, the
performance of the algorithm is mainly independent of this factor.
For instance, for BIRADS categories all classes have similar
behaviour except BIRADS II where the algorithm performs slightly
better. For the three-class annotations the dense class performs
slightly worse than the other two. This lower performance in dense
class is not that clear using BIRADS categories because the
mammograms where the algorithm performs slightly worse are
distributed between both BIRADS III and IV.
|
In Chapter we also concluded that not only the
lesion size and the breast tissue but also the shape of the mass
affect the performance of the algorithms. In that sense,
Table
shows the performance of the
algorithm according to the mass shape: circular or spiculated.
Note that the algorithm performs slightly better for circular
masses that for spiculated ones, although this difference is not
significant.