Results Obtained Including Breast Tissue Information

The performance of our approach is evaluated using a total of

mammograms extracted from the MIAS mammographic database [169]. Among them,

show confirmed masses (the ground-truth provided by an expert) while the rest are normal mammograms. It is important to note that the MIAS database has been used for evaluation as we have accurate manual annotations for the lesions. However, the number of cases found in the MIAS is small for building the statistical models for detection and false positive reduction steps using breast density information. The DDSM database [68], on the other hand, presents less accurate annotations but a larger number of cases. For this reason, the DDSM database has been used for statistical training (remember that the algorithm does not need an accurate set of manual annotations) and the MIAS for testing the mass detection accuracy. Hence, two databases of RoIs have been extracted from the DDSM database containing both masses and non-masses. We are clustering the DDSM database in

BIRADS classes, and each class is clustered according to the size. Again, we used six different sizes according to the lesion size:

, and the number of masses in each interval was respectively,

, and

. Moreover, for the false positive reduction learning step,

normal RoIs for each mass RoI were included in each size-cluster.

The evaluation is again done using Free Receiver Operating Characteristics (FROC) and Receiver Operating Characteristics (ROC) analysis. Remember that FROC analysis quantifies the performance of the algorithms to distinguish between mammograms with and without masses, while a ROC curve indicates the accuracy in which the masses are detected.

In Figure

the performance of the presented algorithms is evaluated. The grey line with pentagrams shows the proposed template matching performance, obtaining a high number of false positives per image (regions marked as suspicious but being normal tissue). This number is clearly reduced by the false positive reduction algorithm, the black line with pentagrams. The lines with hexagrams are obtained when including the breast density information. Note that including the breast density information the performance for both approaches is improved. For instance, at a sensitivity of

the performance for the algorithm without false positive reduction increases from

false positive per image to

, while when including the false positive reduction step goes from

**Figure 6.1:** FROC analysis of the proposed algorithms over the set of mammograms. The grey lines show the results obtained using the template matching algorithm, while the black ones show the proposed algorithm with false positive reduction. Lines with pentagrams are the result of the algorithm without the breast density information, while lines with hexagrams are the obtained ones including this information.
$\includegraphics[width=10.5 cm]{images/frocBDI.eps}$

A comparison between our approach and the algorithms d1 and d2 is also provided in Figure

. Our approach (the black line with hexagrams) outperforms both algorithms, obtaining an intermediate performance between the results shown in Figure

and in Figure

. One should recall that those figures were related to results using the same database for both training and testing (Figure

) and to results using different databases (Figure

). For instance, at a sensitivity of

now the mean number of false positive per image is

, which is an intermediate value between

when training and testing using the same database and

when using different databases.

**Figure 6.2:** FROC analysis of the algorithm over the set of mammograms. The two lines with rhombus show the results obtained using d1 and d2, while the line with hexagrams shows the proposed algorithm.
$\includegraphics[width=10.5 cm]{images/frocBDIComparison.eps}$

Once the mammograms containing masses are detected, a ROC curve and the corresponding

analysis is performed. The overall performance over the

mammograms containing masses resulted in a

value of $86.2\pm 7.3$ and $88.0\pm 6.4$ without and with considering breast density information, respectively. Thus, introducing this information has two effects: firstly,

mean is increased, and secondly, the deviation is reduced, showing that this information is also beneficial in those cases where the algorithm has a lower accuracy. Comparing with algorithms d1 ( $A_z = 84.1 \pm 7.9$ ) and d2 ( $A_z = 88.1 \pm 8.4$ ) the proposal is clearly better than d1 and is similar to d2, despite the drawback of being trained and tested using different databases.

Table

shows the effect of the lesion size for the different algorithms in terms of mean and standard deviation of the

values. Our proposal has similar performances for each size except for the range

, which obtains the best results. The inclusion of the breast density outperforms the results in all classes except for the

group. This can be explained by the fact that the system has more information about the shapes of larger masses when there is more instances in the training database. Note also that the problem of the smallest size described in Section

is now partially solved, obtaining also better results than algorithm d2.

Table 6.1: Influence of the lesion size (in

) for algorithms d1, d2 and the proposed algorithm with and without including breast density information. The results show the mean and the standard deviation

values.

		Lesion Size (in )

		1.20	1.20-1.80	1.80-3.60	3.60

-\|\|--	d1	$92.1\pm 5.5$	$85.8\pm 8.2$	$82.4\pm 7.3$	$79.1\pm 7.2$
	-\|\|--	d2	$84.9\pm 8.8$	$86.7\pm 8.1$	$89.1\pm 9.6$
	-\|\|--	*Eig $\&$ FPRed*	$81.4\pm 9.4$	$89.4\pm 3.9$	$86.0\pm 5.5$
	-\|\|--	*Eig $\&$ FPRed $\&$ BDI*	$86.6\pm 9.3$	$91.1\pm 4.3$	$87.2\pm 6.1$
-\|\|--

We include in Table

a comparison of the performance of the algorithm according to the breast density. We detail in the table both BIRADS categories (using the consensus opinion of three different radiologists) and the fatty/glandular/dense annotations found in the MIAS database. Note that, independently of the classification criteria used, the performance of the algorithm is mainly independent of this factor. For instance, for BIRADS categories all classes have similar behaviour except BIRADS II where the algorithm performs slightly better. For the three-class annotations the dense class performs slightly worse than the other two. This lower performance in dense class is not that clear using BIRADS categories because the mammograms where the algorithm performs slightly worse are distributed between both BIRADS III and IV.

Table 6.2: Influence of the breast tissue in the performance of the proposed algorithm. The top table using the annotations for the breast tissue found in MIAS database while the bottom table using the BIRADS categories. The results show the mean and the standard deviation

values.

	Breast Tissue

	Fatty	Glandular	Dense

-\|\|-- *Eig $\&$ FPRed $\&$ BDI*	$88.3\pm 8.7$	$88.5\pm 7.1$	$85.8\pm 7.3$
-\|\|--


	BIRADS I	BIRADS II	BIRADS III	BIRADS IV

-\|\|-- *Eig $\&$ FPRed $\&$ BDI*	$87.2\pm 6.7$	$89.3\pm 4.2$	$86.7\pm 4.5$	$87.3\pm 7.2$
-\|\|--

In Chapter

we also concluded that not only the lesion size and the breast tissue but also the shape of the mass affect the performance of the algorithms. In that sense, Table

shows the performance of the algorithm according to the mass shape: circular or spiculated. Note that the algorithm performs slightly better for circular masses that for spiculated ones, although this difference is not significant.

Table 6.3: Influence of the lesion shape for the proposed algorithm. The results show the mean and the standard deviation

values.

	Lesion Shape

	Circular	Spiculated

-\|\|- *Eig $\&$ FPRed $\&$ BDI*	$88.2\pm 9.3$	$87.4\pm 8.7$
-\|\|-