Perceptual Color Image Segmentationproject proposal - cs223b & psych221angela chau + jeff walters
|
The problem of segmenting images into coherent regions has been a major subject of research in the field of computer vision. Most of the literature written on the topic has concentrated on using texture information to perform the segmentation. Some researchers, however, have started looking at how color and texture information can be used in combination during the segmentation process, thus introducing the more specific problem of color image segmentation.
Current methods for segmenting color textures include clustering in the different color bands, clustering in RGB space then merging clusters in the Luv space, using a combination of Gabor filters and low-pass color filters, and using Markov Random Field models. The term 'color textures' is often used to describe the input images for color segmentation algorithms.
We would like to investigate the specific problem of color image segmentation. Moreover, we would like to incorporate into our solution findings from studies of the human visual system and its responses to colors and spatial frequencies of colors to see how that can aid and/or improve the segmentation process.
The ultimate judge of computer segmentation results is human perception, and therefore a successful segmentation algorithm will have a strong perceptual modeling component. We propose to implement the color texture segmentation algorithm that is outlined by Mirmehdi and Petrou. The algorithm incorporates two important perceptual concepts- color dependent perceptual smoothing and a multi-scale framework.
The first step is to convert the image to an opponent color space, as described by Wandell and Zhang. The color space is defined by three different color planes, O1 (luminance), O2 (red-green), and O3 (blue-yellow). Once projected onto these three planes, the variation of texture dependence with color, and vice versa, can be modeled using separate convolution kernels in each color plane. The sharpest kernel is applied to the luminance channel.
Convolution kernels are defined over cycles per degree, thus implying a viewing distance from the image. Mirmehdi and Petrou believe that segmentation is a process, where decisions at fine scales are made using prior information from decisions made at coarser scales. The movement to finer scales can be made by foveation to the area of interest, or by moving physically closer to the image (they are equivalent). This process is modeled in the algorithm by creating a causal, multi-scale tower of images. The image set is described as tower rather than a pyramid because no sub-sampling takes place between levels, meaning that the images at every scale have the same number of pixels.
Segmentation is initiated at the coarsest level using K-means clustering. At this early stage, a large number of clusters are used and then adjacent clusters are merged based on Euclidean distances of segment boundaries in the Luv space. Segmentation then descends down the image tower by using the segmentation result at the previous layer as prior information. This method, called perceptual probabilistic relaxation, incorporates information at long distances from the pixel of interest first, and then relies on information from the local region around the pixel as the algorithm progresses. Classical probabilistic relaxation, on the other hand relies on local information first, and then incorporates information further form the pixel of interest.
As mentioned above and in the Mirmehdi-Petrou paper, the criteria most often used to judge the performance of image segmentation algorithms are subjective because we usually want the segmentation of an image by the software to confirm that performed by our own visual systems. Thus, it makes sense to use this "measure" to characterize the performance of this project; if the algorithm achieves a segmentation that looks similar to how we would segment the image, then the algorithm performs well.
To test the algorithm, we will first use a set of generated color textures with very well-defined regions, i.e. having clear boundaries between regions, and then a set of natural color images, similar to those used for the Mirmehdi-Petrou paper. We plan to find appropriate test images from the web and/or create test images of our own using our digital cameras and Photoshop.
In addition to using the subjective criteria mentioned above to judge the segmentation algorithm, we would also like to compare this algorithm's results against those obtained using a few other schemes.
First, we would like to run the segmentation code described above on images that have not been perceptually filtered. In other words, the multiscale segmentation will be performed on a set of images filtered using simple Gaussian kernels in either the RGB space or the opponent color space (since the opponent color space and the RGB space are both linearly related to the XYZ space, it should not matter which space we run the filters in). This comparison will show us the improvement gained by using Zhang and Wandell's perceptual filters.
Secondly, we would like to compare our segmentation results against those obtained using a completely different color segmentation method called EdgeFlow, based on the work of Wei Ma and B.S.Manjunath. The EdgeFlow algorithm should provide an interesting comparison because it uses very different concepts such as Gabor texture feature extraction and segmentation using local texture gradients and also, has been tested with color images. Due to the time constraints of the project, we intend on using the existing implementation of the algorithm, available on the EdgeFlow website. If time permits, we may also run the EdgeFlow algorithm on images filtered with the perceptual filters, to see what interesting results that gives.
The following is a list of papers containing the algorithm we intend on implementing:
Furthermore, we have found the following set of related publications and may use these as additional references during the project:
Papers on perceptual filters:
Papers on multiscale segmentation: