Previous Work

Approaches to Foveation

When rendered on the screen, foveated images take up as many pixels as unfoveated images. Therefore, to realize any data savings from foveation, some algorithm must be used to generate the full image from a reduced data set. In the literature, there appear to be four broad implementation approaches.

One approach that has been used is to directly warp the image using one of a number of nonlinear operations. Wiebe and Basu describe a host of these algorithms. Most of these methods result in compressed images with oddly-shaped boundaries, which are often difficult to transmit and decode in a standard way. In addition, many of the resolution functions used have singularities at the fovea, meaning that special processing must be used in the area of highest fidelity. Finally, these algorithms become even more difficult to implement if the fovea does not lie at the image center. Because of their inflexibility, these algorithms are seldom used for general-purpose image foveation.

A second approach found in the literature (see, for example, Tsumura et al.) is the use of variable-quality DCT compression. In standard block DCT compression, a single quality factor is set for the entire image, and the image is uniformly degraded by that factor. To "foveate" such an image, the quality factor is allowed to vary for each 8x8 DCT block. At blocks close to the fovea center, the quality factor approaches 100%. As blocks get further from the fovea, the quality factor decreases in proportion to visual acuity. Since lower quality factors generally mean that higher spatial frequencies will be represented more poorly, the net effect is to reduce spatial resolution away from the fovea. The maximum degradation is constrained by the size of the DCT blocks. While this approach shows promise, it is unclear how the spatial sensitivity of the human visual system can be correlated with DCT quality factors. Block boundary artifacts also become a problem at locations far from the fovea.

Figure 3: Foveated version of multiresolution pyramid. A region of smaller spatial extent is taken from each higher-resolution level of the pyramid. (From Geisler and Perry)

Geisler and Perry have proposed the use of an image pyramid representation for foveating images. In a standard image pyramid, as described by Burt and Adelson, the image is blurred and subsampled repeatedly to create a number of copies of the original image at different resolutions. In a foveated approach, the image pyramid represents less and less of the image as it reaches the top. Figure 3 shows a standard pyramid representation with subsets removed from each level of the pyramid for foveation.

This manner of image decomposition lends itself to fovea-first transmission. If the foveal regions of the pyramid are transmitted first, then the remaining data can easily be transmitted later without any redundant data transmission. A very slight penalty will be paid over the unfoveated pyramid transmission in order to transmit the coordinates of each foveation subregion.

Finally, there has been a proliferation of literature in recent years attempting to use various wavelet transforms in order to foveate images. Mallat describes the general form of wavelet decompositions, which are very similar in spirit to the multiresolution pyramid approach. Using wavelets, an image can be decomposed into four child images, each of which is one quarter the size of the original. These four images can be used to fully recreate the original. The advantage of this approach over the Laplacian pyramid is that no redundant information is added in order to build the pyramid. At each level of the wavelet pyramid, four subimages are created which represent the low- and high-frequency components of the image in each of the two dimensions. Therefore, those portions of the high-frequency subimages that are far from the fovea need not be transmitted. This approach has been studied by Chang and Yap, as well as Dreizen. Bandwidth reduction is again achieved by sending only data which can be resolved by the viewer. This approach can also be easily modified in order to accomodate fovea-first transmission.

Foveal Resolution Functions

Regardless of the image decomposition strategy which is employed, the algorithm must determine which resolutions are appropriate at each point in the transmitted image. A function must be used which approximates the spatial response of the human visual system. The literature describes several such functions.

Schwartz studied the signals mapped onto the visual cortex and determined that a logarithmic resolution function would be most appropriate. Though this function is biologically-based, its implementation is difficult due to singularities along one axis. Therefore, simplifications of this algorithm abound. Wiebe and Basu describe an array of variable resolution transforms designed for ease of computation and storage.

Geisler and Perry use a variation on Schwartz' resolution function which seems to have been well-explored in the literature. The function is

(1)

where e2 is the eccentricity (in degrees) at which visual acuity is half as good as the center of the fovea, alpha is a decay constant, and CT0 is the minimum contrast threshold of the visual system. f is the maximum spatial frequency discernible (in cpd) at a given eccentricity ec (in degrees).

Using this formula, Arnow and Geisler found that the best fit for humans occurred with alpha = 0.106, e2 = 2.3 degrees, and CT0 = 1/75. Other researchers have published similar results.

Bill Overall / March 12, 1999