
The Human Fovea
The human visual system samples the world nonuniformly. At the
retina, there is a hich concentration of cone
photoreceptors only within a few degrees of the point of gaze.
Outside of this central region, or fovea, the eye's spatial resolution
drops significantly. By studying cone densities, it has been found
that the spatial resolution of human vision is cut in half at about
2.3 degrees from the point of fixation. At twenty degrees from the
point of fixation, the maximum spatial resolution which the human eye
can perceive is cut by a factor of ten.
Matching Image Resolution to the Human Visual System
In certain situations, it may be appropriate to mimic this nonuniform
resolution in the context of digital imaging and video. The most
obvious application is in
image and video transmission systems where bandwidth is at a premium.
For instance, consider an image database that exists at a remote
location. If the network connection between the user and the image
server is slow, then compression algorithms may be necessary.
Traditional image and video compression strategies already consider
some of the limitations of the human visual system. For instance,
chrominance information is usually sampled twice as coarsely as
luminance information. In addition, only three values are used
to specify the color and intensity of each pixel since the visual
system has only three types of cone photoreceptors. However, the
spatial variation in resolution of the human visual system is not
exploited in any standard compression strategy. If system bandwidth
limitations require more compression than is possible through
conventional methods, then one could further compress the data by
applying a spatially variant filter to the image. This filter would
maintain high fidelity around the point of gaze, while reducing
spatial resolution away from the point of gaze according to the
sensitivity function of the human visual system.
An example of such an image is shown in
Figure 1. The left half of this
image is foveated; that is, the resolution of the image is spatially
varied to match the capabilities of the human visual system. For
reference, the right half of the image has not been modified. If one looks
at the top portion of the image from a reasonable distance, the left
half of the image should not look any different from the right half.
Therefore, this image is said to be perceptually lossless (assuming
that the point of gaze is at the top of the image).

Applications Requiring Eye Tracking
While perceptually lossless image and video transmission is a worthy goal, several obstacles prevent this approach from overwhelming success. First of all, in order to determine the point of gaze, a somewhat unwieldy eye tracking apparatus must be worn. In most situations, this is too costly and annoying to be feasible. For instance, the average internet surfer does not want to wear headgear in order to search through web pages. On the other hand, someone wearing a virtual reality helmet wouldn't notice the addition of an eye-tracker within the helmet. In this application, eye tracking might be appropriate if there were a compelling reason for video bandwidth reduction.
In addition, the idea of eye tracking is less practical for still images. If the image must be retransmitted each time the eye moves, then the bandwidth savings from foveation will quickly be lost. This solution is only useful in the context of video transmission, and in that case, the position of the eye must reach the video server rapidly enough that the appropriately foveated frame can be transmitted. In packet-switched contexts without quality of service guarantees (such as the internet), this upstream requirement may not be met.
Applications without Eye Tracking
Although eye tracking is required for true perceptually lossless foveation, it is not necessary to perform eye tracking for foveation in general. If the images to be transmitted have certain "natural" points of gaze, then the image can be pre-foveated with the anticipation that the viewer will be looking in a certain location.
In this study, two different applications of this type of foveation are considered. First, foveation is applied to a specific foveation point in an image (or a number of points) for the purpose of image compression. This method will be compared to the standard method of lossless compression using a Laplacian pyramid.
Secondly, foveation will be applied to images as a way of providing the most useful information in an image as quickly as possible. Therefore, the foveated image will be transmitted first, then the remaining information can be transmitted in order to provide a complete, unfoveated representation. In an application which is capable of decoding such images, the image can be rendered as data is received. Therefore, a foveated image will be displayed first, which may be sufficient for the viewer's purposes. If the transmission is allowed to continue to completion, the entire unfoveated image will eventually be displayed. This method will be shown to provide compression that is almost identical to the compression achieved by a standard multiresolution pyramid.
An example of this "fovea-first" transmission strategy is shown in Movie 1. This movie shows what might be rendered on screen as a remote server sends a fovea-first image. As more data becomes available, better approximations to the original image can be made. After the foveated image is rendered, the remaining image data are sent in order to complete the undegraded image.
One application of this strategy is in cases where large databases of images must be sorted through remotely. For instance, imagine that a remote server contains the photographs of all students at a particular university. If a remote user wants to find a particular student by recognizing their face, then that user might appreciate having the useful portion of each image appear as quickly as possible. If foveated images are sent initially, then the user can rapidly reject faces which don't match and move on to the next student in the database. When a potential match is encountered, the user can wait for the entire uniform-resolution image to be rendered.