Introduction

The Human Fovea

The human visual system samples the world nonuniformly. At the retina, there is a hich concentration of cone photoreceptors only within a few degrees of the point of gaze. Outside of this central region, or fovea, the eye's spatial resolution drops significantly. By studying cone densities, it has been found that the spatial resolution of human vision is cut in half at about 2.3 degrees from the point of fixation. At twenty degrees from the point of fixation, the maximum spatial resolution which the human eye can perceive is cut by a factor of ten.

Matching Image Resolution to the Human Visual System

In certain situations, it may be appropriate to mimic this nonuniform resolution in the context of digital imaging and video. The most obvious application is in image and video transmission systems where bandwidth is at a premium. For instance, consider an image database that exists at a remote location. If the network connection between the user and the image server is slow, then compression algorithms may be necessary.

Traditional image and video compression strategies already consider some of the limitations of the human visual system. For instance, chrominance information is usually sampled twice as coarsely as luminance information. In addition, only three values are used to specify the color and intensity of each pixel since the visual system has only three types of cone photoreceptors. However, the spatial variation in resolution of the human visual system is not exploited in any standard compression strategy. If system bandwidth limitations require more compression than is possible through conventional methods, then one could further compress the data by applying a spatially variant filter to the image. This filter would maintain high fidelity around the point of gaze, while reducing spatial resolution away from the point of gaze according to the sensitivity function of the human visual system.

An example of such an image is shown in Figure 1. The left half of this image is foveated; that is, the resolution of the image is spatially varied to match the capabilities of the human visual system. For reference, the right half of the image has not been modified. If one looks at the top portion of the image from a reasonable distance, the left half of the image should not look any different from the right half. Therefore, this image is said to be perceptually lossless (assuming that the point of gaze is at the top of the image).

Figure 2: In many proposed applications of foveated imaging, there is a remote image server which must communicate through a low-bandwidth network to a local machine. If the local machine is capable of complex image-processing functions, then the possibility of speeding transmission through foveation exists.

Applications Requiring Eye Tracking

While perceptually lossless image and video transmission is a worthy goal, several obstacles prevent this approach from overwhelming success. First of all, in order to determine the point of gaze, a somewhat unwieldy eye tracking apparatus must be worn. In most situations, this is too costly and annoying to be feasible. For instance, the average internet surfer does not want to wear headgear in order to search through web pages. On the other hand, someone wearing a virtual reality helmet wouldn't notice the addition of an eye-tracker within the helmet. In this application, eye tracking might be appropriate if there were a compelling reason for video bandwidth reduction.

In addition, the idea of eye tracking is less practical for still images. If the image must be retransmitted each time the eye moves, then the bandwidth savings from foveation will quickly be lost. This solution is only useful in the context of video transmission, and in that case, the position of the eye must reach the video server rapidly enough that the appropriately foveated frame can be transmitted. In packet-switched contexts without quality of service guarantees (such as the internet), this upstream requirement may not be met.

Applications without Eye Tracking

Although eye tracking is required for true perceptually lossless foveation, it is not necessary to perform eye tracking for foveation in general. If the images to be transmitted have certain "natural" points of gaze, then the image can be pre-foveated with the anticipation that the viewer will be looking in a certain location.

In this study, two different applications of this type of foveation are considered. First, foveation is applied to a specific foveation point in an image (or a number of points) for the purpose of image compression. This method will be compared to the standard method of lossless compression using a Laplacian pyramid.

Secondly, foveation will be applied to images as a way of providing the most useful information in an image as quickly as possible. Therefore, the foveated image will be transmitted first, then the remaining information can be transmitted in order to provide a complete, unfoveated representation. In an application which is capable of decoding such images, the image can be rendered as data is received. Therefore, a foveated image will be displayed first, which may be sufficient for the viewer's purposes. If the transmission is allowed to continue to completion, the entire unfoveated image will eventually be displayed. This method will be shown to provide compression that is almost identical to the compression achieved by a standard multiresolution pyramid.

An example of this "fovea-first" transmission strategy is shown in Movie 1. This movie shows what might be rendered on screen as a remote server sends a fovea-first image. As more data becomes available, better approximations to the original image can be made. After the foveated image is rendered, the remaining image data are sent in order to complete the undegraded image.

One application of this strategy is in cases where large databases of images must be sorted through remotely. For instance, imagine that a remote server contains the photographs of all students at a particular university. If a remote user wants to find a particular student by recognizing their face, then that user might appreciate having the useful portion of each image appear as quickly as possible. If foveated images are sent initially, then the user can rapidly reject faces which don't match and move on to the next student in the database. When a potential match is encountered, the user can wait for the entire uniform-resolution image to be rendered.

Bill Overall / March 12, 1999