We clustered the
MLO views into four different groups
according to their mass size. Each group corresponds to the
following intervals for mass sizes:
,
,
,
. In each
interval there were, respectively,
,
,
and
masses.
The intervals are different compared to the MIAS database because
the masses in this database are smaller.
As already mentioned, the database ground-truth is provided by six
experts. Thus, we can compute the performance of the algorithm
using those regions where different number of radiologists agree.
Considering a true mass those pixels where the six radiologists
coincide we obtained
. In contrast, if we
consider a mass those pixels were at least five radiologists
agree,
was
. And decreasing the number of
agreement we obtained
,
,
,
and
for
,
, and
, respectively. This shows
an overall trend of performance decrease as the number of
radiologist agreement also decreases. This is due to the fact that
a different number of thin spicules appears when considering all
radiologist annotations as ground-truth. In contrast, only the
centre of the mass and clear spicules are taken into account when
considering a mass those pixels where all radiologists coincide.
In the rest of the evaluation with this database, only this case
is analyzed.
The mean
for all mammograms for d1 and d2
algorithms was respectively
and
, while
our approach obtained
. Note that using this database
the overall results obtained by the algorithm d1 are in line
with the obtained by the others algorithms. This is due to the
fact that the masses in this database are smaller than in the MIAS
database, where d1 performs well for small size masses.
Table
shows the performance of
the algorithms depending on the size of the masses. Note that the
same trend shown in the previous section for algorithms d1
and Eig are still valid, and they perform better for smaller
masses than for larger ones. In contrast, algorithm d2 shows
a similar behaviour for all sizes.