next up previous contents
Next: Results Based on Consensus Up: MIAS Database Previous: MIAS Database   Contents


Results Based on Individual Manual Classification

Initial experiments consist of the evaluation of the proposed method using the individual expert classifications independently. We used a leave-one-woman-out methodology, i.e. the left and right mammograms of a woman are analyzed by a classifier trained using the mammograms of all other women in the database. The leave-one-woman-out methodology is used to avoid bias as the left and right mammograms of a woman are expected to have similar internal morphology [94]. The confusion matrices for the three classifiers: the SFS+kNN, C$ 4.5$ , and Bayesian approaches are shown in Table [*], where each row corresponds to results based on the manual classification by an individual radiologist. In this work a value of $ k=7$ was used for kNN. Other odd values ranging from $ 5$ to $ 15$ were tested and gave similar results.


Table 3.4: Confusion matrices for MIAS classification according to BIRADS categories for individual mammographic experts classification. The results are based on a leave-one-woman-out methodology with $ 322$ mammograms. (a) kNN classifier, (b) C$ 4.5$ decision tree, and (c) Bayesian classifier.
               kNN ( $ 78\%,\kappa = 0.70$ ) C$ 4.5$ ( $ 74\%,\kappa = 0.64$ ) Bayesian ( $ 83\%,\kappa = 0.76$ )
  B-I B-II B-III B-IV
B-I $ 113$ $ 10$ $ 5$ $ 1$
B-II $ 8$ $ 59$ $ 9$
B-III $ 4$ $ 13$ $ 46$
B-IV $ 1$ $ 3$ $ 6$

B-I B-II B-III B-IV
$ 114$ $ 12$ $ 2$ $ 1$
$ 18$ $ 47$ $ 12$ $ 2$
$ 2$ $ 11$ $ 48$ $ 9$
0 $ 1$ $ 13$ $ 30$

B-I B-II B-III B-IV
$ 118$ $ 6$ $ 5$ 0
$ 7$ $ 60$ $ 10$ $ 2$
0 $ 6$ $ 53$ $ 11$
0 $ 2$ $ 7$ $ 35$
               kNN ( $ 74\%,\kappa = 0.64$ ) C$ 4.5$ ( $ 67\%,\kappa = 0.55$ ) Bayesian ( $ 80\%,\kappa = 0.73$ )
  B-I B-II B-III B-IV
B-I $ 75$ $ 8$ $ 2$ $ 1$
B-II $ 7$ $ 85$ $ 16$
B-III $ 1$ $ 20$ $ 55$
B-IV $ 2$ $ 7$ $ 11$

B-I B-II B-III B-IV
$ 69$ $ 15$ $ 2$ 0
$ 13$ $ 73$ $ 22$ $ 4$
$ 1$ $ 27$ $ 46$ $ 7$
0 $ 1$ $ 13$ $ 29$

B-I B-II B-III B-IV
$ 78$ $ 6$ $ 2$ 0
$ 10$ $ 93$ $ 8$ $ 1$
0 $ 16$ $ 55$ $ 10$
0 $ 1$ $ 10$ $ 32$
               kNN ( $ 74\%,\kappa = 0.63$ ) C$ 4.5$ ( $ 72\%,\kappa = 0.58$ ) Bayesian ( $ 82\%,\kappa = 0.73$ )
  B-I B-II B-III B-IV
B-I $ 50$ $ 5$ $ 1$ $ 3$
B-II $ 13$ $ 53$ $ 19$
B-III 0 $ 21$ $ 115$
B-IV $ 3$ $ 3$ $ 7$

B-I B-II B-III B-IV
$ 43$ $ 14$ 0 $ 2$
$ 15$ $ 49$ $ 22$ 0
$ 2$ $ 15$ $ 119$ $ 7$
$ 1$ 0 $ 13$ $ 20$

B-I B-II B-III B-IV
$ 51$ $ 5$ $ 1$ $ 2$
$ 9$ $ 64$ $ 12$ $ 1$
$ 1$ $ 16$ $ 122$ $ 4$
0 $ 2$ $ 6$ $ 26$
(a) (b) (c)


For expert A, we can see that the SFS+kNN correctly classifies about $ 78\%$ of the mammograms, while the C$ 4.5$ decision tree achieves $ 74\%$ of correct classification. kNN clearly outperforms C$ 4.5$ when classifying mammograms belonging to BIRADS II, while for the rest of BIRADS the performance is quite similar. On the other hand, C$ 4.5$ tends to classify the mammograms according to its own or its neighbouring BIRADS classification, while kNN shows a larger dispersion. The $ \kappa$ coefficient also reflects that kNN has better performances than C$ 4.5$ , with values equal to $ 0.70$ and $ 0.64$ , respectively. Note that both classifiers belong to the Substantial category according to the scale in Table [*].

The results obtained by the Bayesian classifier are shown in Table [*](c). This classifier shows an increase in the overall performance when compared to the individual classifiers, reaching $ 83\%$ correct classification. This is an increase of $ 5\%$ and $ 9\%$ when compared to kNN and C$ 4.5$ , respectively. When considering the individual BIRADS classes, the percentage of correct classification for BIRADS I is around $ 91\%$ , whilst in the other cases, the percentages are $ 76\%$ for BIRADS II, $ 76\%$ for BIRADS III, and $ 80\%$ for BIRADS IV. Note that using the Bayesian classifier, $ \kappa$ is increased to $ 0.76$ .

The results obtained for expert B are slightly decreased with respect to those obtained for expert A. Specifically, $ 74\%$ of the mammograms were correctly classified by using the SFS+kNN classifier, while the C$ 4.5$ results remained at $ 67\%$ . The better results for the kNN classifier are independent of the BIRADS classes, except for the BIRADS IV class, in which C$ 4.5$ clearly outperforms kNN. The results obtained by the Bayes classifier shows an increase of the performance of $ 6\%$ and $ 13\%$ when compared to kNN and C$ 4.5$ , respectively, obtaining an overall performance of $ 80\%$ . When considering the individual BIRADS classes, the percentage of correct classification for BIRADS I is around $ 91\%$ , whilst for the other cases, the percentages are $ 83\%$ for BIRADS II, $ 68\%$ for BIRADS III, and $ 74\%$ for BIRADS IV. The $ \kappa$ value is equal to $ 0.73$ .

The last row of Table [*] shows the results obtained for Expert C. The performance of the classifiers is similar to that obtained by using the ground truth of Expert B. The kNN classifier obtained $ 74\%$ correct classification, while C$ 4.5$ obtained $ 72\%$ . Using the Bayes classifier, $ 82\%$ of the mammograms were correctly classified. In summary, $ 86\%$ correct classification for BIRADS I, $ 74\%$ for BIRADS II, $ 85\%$ for BIRADS III, and $ 78\%$ for BIRADS IV. The $ \kappa$ value is equal to $ 0.73$ .

In conclusion, the best classification rates are obtained using the Bayesian combination. For each individual expert $ 83\%$ , $ 80\%$ , and $ 82\%$ correct classification are obtained, respectively.

In line with other publications [17,137], we can reduce the four-class classification problem to the following two-class problem: {BIRADS I and II} vs {BIRADS III and IV}, or in words, low density (low risk) versus high density (high risk) classification, which from a mammographic risk assessment point of view might be more appropriate than the four-class division. Comparing to Expert A, the percentage of correct classification is about $ 92\%$ for the three classifiers and low breast densities, while for dense breasts the percentage is $ 82\%$ , $ 88\%$ , and $ 93\%$ for the kNN, C$ 4.5$ and the Bayesian combination, respectively. In contrast, for Expert B, the correct classification percentage for low density breasts is around $ 88\%$ for the single classifiers and $ 94\%$ for the combination, while for high density breasts it is reduced to $ 76\%$ for each classifier, and $ 86\%$ for their combination. On the other hand, using Expert C, the correct classification percentage for low density breasts is $ 83\%$ for the single classifiers and $ 89\%$ for the combination, while for high density breasts the kNN obtains $ 85\%$ , and the other classifiers $ 89\%$ .

For this two class approach, in summary, the results are $ 92\%$ , $ 91\%$ and $ 89\%$ of correct classification for Experts A, B and C, respectively.


next up previous contents
Next: Results Based on Consensus Up: MIAS Database Previous: MIAS Database   Contents
Arnau Oliver 2008-06-17