Initial experiments consist of the evaluation of the proposed method using the individual expert classifications independently. We used a leave-one-woman-out methodology, i.e. the left and right mammograms of a woman are analyzed by a classifier trained using the mammograms of all other women in the database. The leave-one-woman-out methodology is used to avoid bias as the left and right mammograms of a woman are expected to have similar internal morphology [94]. The confusion matrices for the three classifiers: the SFS+kNN, C , and Bayesian approaches are shown in Table , where each row corresponds to results based on the manual classification by an individual radiologist. In this work a value of was used for kNN. Other odd values ranging from to were tested and gave similar results.
For expert A, we can see that the SFS+kNN correctly classifies about of the mammograms, while the C decision tree achieves of correct classification. kNN clearly outperforms C when classifying mammograms belonging to BIRADS II, while for the rest of BIRADS the performance is quite similar. On the other hand, C tends to classify the mammograms according to its own or its neighbouring BIRADS classification, while kNN shows a larger dispersion. The coefficient also reflects that kNN has better performances than C , with values equal to and , respectively. Note that both classifiers belong to the Substantial category according to the scale in Table .
The results obtained by the Bayesian classifier are shown in Table (c). This classifier shows an increase in the overall performance when compared to the individual classifiers, reaching correct classification. This is an increase of and when compared to kNN and C , respectively. When considering the individual BIRADS classes, the percentage of correct classification for BIRADS I is around , whilst in the other cases, the percentages are for BIRADS II, for BIRADS III, and for BIRADS IV. Note that using the Bayesian classifier, is increased to .
The results obtained for expert B are slightly decreased with respect to those obtained for expert A. Specifically, of the mammograms were correctly classified by using the SFS+kNN classifier, while the C results remained at . The better results for the kNN classifier are independent of the BIRADS classes, except for the BIRADS IV class, in which C clearly outperforms kNN. The results obtained by the Bayes classifier shows an increase of the performance of and when compared to kNN and C , respectively, obtaining an overall performance of . When considering the individual BIRADS classes, the percentage of correct classification for BIRADS I is around , whilst for the other cases, the percentages are for BIRADS II, for BIRADS III, and for BIRADS IV. The value is equal to .
The last row of Table shows the results obtained for Expert C. The performance of the classifiers is similar to that obtained by using the ground truth of Expert B. The kNN classifier obtained correct classification, while C obtained . Using the Bayes classifier, of the mammograms were correctly classified. In summary, correct classification for BIRADS I, for BIRADS II, for BIRADS III, and for BIRADS IV. The value is equal to .
In conclusion, the best classification rates are obtained using the Bayesian combination. For each individual expert , , and correct classification are obtained, respectively.
In line with other publications [17,137], we can reduce the four-class classification problem to the following two-class problem: {BIRADS I and II} vs {BIRADS III and IV}, or in words, low density (low risk) versus high density (high risk) classification, which from a mammographic risk assessment point of view might be more appropriate than the four-class division. Comparing to Expert A, the percentage of correct classification is about for the three classifiers and low breast densities, while for dense breasts the percentage is , , and for the kNN, C and the Bayesian combination, respectively. In contrast, for Expert B, the correct classification percentage for low density breasts is around for the single classifiers and for the combination, while for high density breasts it is reduced to for each classifier, and for their combination. On the other hand, using Expert C, the correct classification percentage for low density breasts is for the single classifiers and for the combination, while for high density breasts the kNN obtains , and the other classifiers .
For this two class approach, in summary, the results are
,
and
of correct classification for Experts A, B and
C, respectively.