The popular k-Means clustering algorithm, first proposed by MacQueen [108], is an error-based minimization algorithm, where the minimizing function is the sum of squared error:
In this equation,
represents the partition of the
image
,
is the centroid of cluster
, and
is
each pattern of the image (each pixel). Two factors have made the
k-Means one of the most popular clustering algorithms: it has
linear time complexity and it is easy to implement [78].
In mammography, the k-Means algorithm has been applied by Sahiner
et al. [155,157], who used the intensity of the
pixels as features. Hence, the suspicious regions will be those
regions with higher average grey-level. In our implementation the
algorithm works with additional features. The aim of the first one
is to prevent disconnected regions and, as suggested Jain et
al. [78], we use a smoothed version of the original
mammogram. In addition, we have included texture features derived
from co-occurrence matrices [64] and Laws
filters [101]. From co-occurrence matrices, for distances
one to five and angles
,
,
and
135
, the following statistics have been extracted:
contrast, energy, entropy, and homogeneity. The other texture
features are based on Laws energy filters of size five.
As has been discussed in
Section , the k-Means
approach starts by randomly selecting a pre-determined number of
seed points. In our experiments, this number can vary from
to
. However, we have observed that best performances are reached
when over-segmenting the images. In such cases, the location of a
mass is indicated by concentric regions of decreasing intensity.