ROC Analysis

Next: Detection Evaluation Up: Evaluation of Classifiers Previous: Confusion Matrices Contents

ROC Analysis

ROC analysis proceeds from the analysis of a special case of confusion matrix when there are only two classes: the instances can only be positive or negative. Table shows graphically a general confusion matrix for this special case. The entries in the confusion matrix have the following meaning:

is the number of correct predictions that an instance is positive.
is the number of incorrect predictions that an instance is negative (and actually is positive).
is the number of incorrect of predictions that an instance is positive (and actually is negative).
is the number of correct predictions that an instance is negative.

Table C.3: Example of confusion matrix with only two classes.

	Automatic

	Positive	Negative
Positive
	Negative

For this x confusion matrix a set of parameters [44] are typically extracted in order to evaluate the result:

Accuracy: is the proportion of the total number of positive predictions. It is determined as:

$\displaystyle Accuracy = \frac{a+d}{a+b+c+d}$ (C.2)
True positive rate (also known as recall or sensitivity): is the proportion of positive cases that were correctly identified, as calculated using the equation:

$\displaystyle TPR = \frac{a}{a+b}$ (C.3)
True negative rate (or specificity): is the proportion of negative cases that were correctly identified:

$\displaystyle TNR = \frac{d}{c+d}$ (C.4)
False positive rate: the proportion of negatives cases that were incorrectly classified as positive:

$\displaystyle FPR = \frac{c}{c+d}$ (C.5)
False negative rate: the proportion of positives cases that were incorrectly classified as negative:

$\displaystyle FNR = \frac{b}{a+b}$ (C.6)
Precision: is the proportion of the predicted positive cases that were correct, as calculated using the equation:

$\displaystyle Precision = \frac{a}{a+b}$ (C.7)

A ROC graph is a plot with the false positive rate on the -axis and the sensitivity (the true positive rate) on the -axis. Thus, each axis ranges from 0 to . The point is the perfect classifier: it classifies all positive cases and negative cases correctly. The point represents a classifier that predicts all cases to be negative, while the point corresponds to a classifier that predicts every case to be positive. Point is the classifier that is incorrect for all classifications. When no useful discrimination is achieved the true positive rate is always equal to the false positive rate, obtaining thus a point in the diagonal line from point to point .

However, a ROC graph has more information that a single confusion matrix. In many cases, a classifier has a parameter that can be adjusted to increase true positive rate at the cost of an increased false positive rate. Therefore, each parameter setting provides a point on the graph, and varying the parameter a curve is achieved.

Figure shows an example of a ROC graph with two ROC curves labeled and , and the probability obtained by chance. Curve obtains better performance than curve , as it goes closer to the point , the perfect classifier. A measure commonly derived form a ROC curve is the area under the curve [19], which is an indication for the overall sensitivity and specificity of the observer, commonly called . As closest to the upper-left-hand corner of the graph, the area increases until a maximum area of .

**Figure C.1:** Two ROC curves and the diagonal line marking the chance classifier.
$\includegraphics[width=10 cm]{images/roc.eps}$

Next: Detection Evaluation Up: Evaluation of Classifiers Previous: Confusion Matrices Contents

Arnau Oliver 2008-06-17