Perceptual Color Image Segmentation

literature review - cs223b

angela chau + jeff walters
winter qtr 2003

For the final project, we looked at one solution to the problem of color image segmentation proposed by Mirmedhi and Petrou. However, it is useful to note that color image segmentation has been a topic of research for many years and numerous other methods of solving this problem have been proposed. In this literature survey, we consider two other such approaches by outlining the basic theories, algorithms, and some advantages/disadvantages of the proposed methods. We do not intend on restating the detailed proofs or derivations given in the papers. Interested readers should refer to the relevant paper(s) instead.

Overview of Color Image Segmentation

Before we consider each of the methods in detail, we briefly look at the field of image segmentation overall. Liu and Yang, in [3], classified techniques for image segmentation into three broad categories: histogram-based, neighborhood-based, and physically-based methods. Histogram-based methods usually perform some sort of clustering in a pre-defined measurement space, e.g. RGB. The implementation proposed by Mirmehdi and Petrou can be considered part of this category, albeit with many more complex extensions than simple clustering. Neighborhood-based techniques consider small local neighborhoods in an image and use this information to aid in decision-making. The Deng and Majunath paper [2] proposes a method that falls roughly into this class. The third type of approaches uses the actual physics of light and the color formation process to perform segmentation, and the method proposed by Healey in [1] is a prime example.

A major problem in the field of color image segmentation, and even image segmentation in general, has been what is known as 'the lack of ground truth'[2]. Until recently with the creation of the Berkeley Segmentation dataset, each paper's authors would use completely different images to test their algorithms. Therefore, it was extremely difficult to judge the quality of a segmentation technique or even to compare it with its peers. The dataset mentioned above marks a first step in attempting to create a sort of standard for segmentation. However, segmentation is still a very subjective task in nature and even within the dataset, there are numerous inconsistencies. For this reason, comparisons of algorithms have been traditionally quite problematic. In this survey, instead of trying to draw direct comparisons between algorithms, we will attempt to list out the major advantages and disadvantages we see with each solution and let the reader draw his/her own conclusions.

Segmenting Images Using Normalized Color

In [1], Healey proposed a method for segmenting images based on the physics of light and image formation, such that the same object in shadow and in highlight would not be segmented into two different regions. The basic idea behind this technique is the determination of the number of different materials in an image and then using this information, labeling each pixel as belonging to one of these material classes.

Theory

Before describing the algorithm, it is useful to summarize some of the more theoretical findings of the paper. An image consists of sensor measurements of a scene. For a material surface with a given reflectance and illuminated by a single spectral power distribution, the sensor values for that surface will lie on a line in sensor space that intersects the origin. Taking into account the effect of sensor noise, the problem of discriminating between different materials becomes the problem of finding these various noisy lines within sensor space. Once we know an image contains M number of different materials, we can then assign a sensor vector v to one of these M classes by choosing the assignment which minimizes an error measure. In this case, Healey chooses to maximize a minimum error rate discriminant function defined as gi(v).

Algorithm

The algorithm Healey proposed is an iterative one. In the first iteration, we consider the entire image as one region and place it on a "region stack." We also initialize an empty material list, which will later keep track of the mean and covariance estimates for that material class, computed using the set of sensor measurements already assigned to that class.

During each iteration, we perform edge detection on a region from the stack to identify the number of materials in the region. The assumption here is that different materials in a region will result in edges. If a region contains more that one material, we split the region and push the subregions onto the stack.

If a region contains no edges, then we try to classify the material in the region by first calculating a mean normalized color vector v', and then using this mean in the discriminant functions gi(v') of each of the classes on the current material list. If none of the discriminant functions exceed a certain threshold T, then the region is assumed to be a new material and we add it to the material list. Otherwise, we label the region with the material class that maximizes this discriminant function for the given v' and add its pixels to the class for updating the class statistics.

To find highlights, Healey extended the algorithm describe above to first identify regions that are candidate highlight regions in the image and then testing these regions to confirm they are indeed highlights and not actually a different brighter material in front of a background material. The observation he made is that if the region is from another material, the sensor values for it will lie on another line through the origin in sensor space. If, instead, it is a highlight, the sensor values will lie on a line which intersects the hypothesized body reflection line away from the origin.

Advantages and Disadvantages

The biggest advantage of the technique Healey proposed is that highlights in an image will not be classified as a region separate from the actual material generating the highlight. In addition, segmentation does not rely on image features based on scene geometry so the algorithm is more general in that sense.

From the results presented in [1], we can also list out several limitations of this algorithm. The first is that due to the assumptions made about the illumination conditions of the scene, the test images in the paper were all taken under experimental setups and none were natural images. In particular, the very common case of inter-reflection between objects in a scene is out of the scope of this method. Sensor-wise, both MOS and CCD sensors presented some disadvantages: MOS sensors for their higher noise characteristics and CCD sensors for the blooming effect exhibited when the scene contained bright reflections from metallic surfaces.

The dependence on edge detection also makes the algorithm sensitive to the quality of the edge detector used. False positives will create extraneous regions and false negatives will group two different materials as one and thereby, mistakenly skew the statistics of the materials.

Color Image Segmentation

In [2], Deng and Majunath proposed a completely different method for color image segmentation known as JSEG. The algorithm relies hevily on the calculation of a measure called J, the concept of J-images, and the process of region growing on a multiscale level.

Theory

The main theory behind the work in [2] lies with the formation of a segmentation criterion called J. After quantizing an image with a few representative colors and then grouping pixels with the same quantized colors, we obtain a representation of the image called a class map. Using this class map, we calculate J as the ratio of the distances between different classes and the distances between members of the same class.

When J is large in a region, the different classes in the region are spatially separate from each other and each class is spatially confined. Thus, there are probably subregions into which we can divide this region. On the other hand, a small J indicates that we have found a "good" region. Note that a "good" region does not necessarily mean a region consisting only of a single class; for example, a region where four different class labels occur in alternation, and thus are uniformly distributed, will also result in a small J value.

Algorithm

Based on the development of the J criterion above, Deng and Majunath proposed the following: Instead of attempting to minimize J over the entire image, we calculate J over small windows of the image in order to detect region boundaries (higher J values). The set of J values resulting from looking at the image through a particular window size forms a J-image, which then looks in 3D like a terrain map with valleys and peaks corresponding to region centers and boundaries, respectively.

The algorithm they developed in [2] is also an iterative one working in multiple scales. The initial scale is large, using a large window size and coarse downsampling. Each successive scale reduces the size of the window and utilizes finer sampling of the image. During each iteration at a particular scale, the J-image is computed for the region under consideration (initially, the whole image is considered to be one region) and then using this terrain map, a set of good valleys (region centers) are determined. With this set of valleys, we begin a region growing process to arrive at a set of regions for this scale. Each of these region then undergoes the same segmentation process individually until a scale threshold is reached.

At the end of the algorithm, the image is usually overly segmented so region merging based on Euclidean color distances is performed until we reach a maximum threshold. At this point, we have the final segmentation results.

Advantages and Disadvantages

Results presented in [2] are very encouraging. Unlike [1], the algorithm has been tested successfully on many natural images. While the iterative algorithm usually results in over-segmentation, the addition of the region merging step reverses this over-segmentation and succeeds in reducing the J value of the image.

Some limitations we see are the dependence of the algorithm on a good quantization scheme (although one based on perception is referenced in the paper). Also, Deng and Majunath themselves listed some situations in which their algorithm fails: when two neigboring regions have no clear boundary and when an object is segmented into several regions due to changes in shading and illumination.

Conclusions

As stated in the overview, we would like to let the reader draw his/her own conclusions in regards to the comparison of these two algorithms with each other and with the Mirmehdi-Petrou algorithm. A few things of note is that while the Deng-Majunath technique has given good results with natural images, Healey's method has only been known to work under strict conditions created under laboratory settings. Yet, Healey's method can classify highlight regions correctly whereas the Deng-Majunath one cannot. Thus, choosing an appropriate image segmentation algorithm depends highly on the application and also, after carefully considering the tradeoffs between the strengths and weaknesses of each technique.

References

[1] G. Healey, "Segmenting Images Using Normalized Color," IEEE Transactions on Systems, Man. and Cybernetics, 22(1): 64-73, Jan/Feb 1992.

[2] Y. Deng, B.S. Majunath, and H. Shin, "Color Image Segmentation", Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Fort Collins, CO, USA, June 1999, p.446-51 vol. 2.

[3] J. Liu and Y. Yang, "Multiresolution Color Image Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(7): 689-700, July 1994.