Implementation

Encoding Method

Of the encoding approaches described in the literature, the multiresolution methods appear to have the most flexibility. In particular, the Laplacian and Gaussian multiresolution pyramids have the advantage of being somewhat intuitive, while allowing for an arbitrary spatial resolution weighting function. Therefore, this study uses Laplacian and Gaussian multiresolution pyramids as the foundation for foveating images.

In order to generate foveated images, an original uniform-resolution image is decomposed into a Gaussian pyramid of several levels. The levels of this pyramid are numbered, where level 1 represents the full-sized image, and each smaller-sized image occupies a higher-numbered level. The Gaussian pyramid is chosen because blurred versions of the original image will be required for certain portions of the foveation process. Before transmission, it is assumed that the appropriate subtractions will occur in order to generate a Laplacian image pyramid for improved compressibility.

Next, a weighting function is computed using equation (1) above. This is done in several steps. First, the distance from the observer to each point on the image is calculated, using a nominal value for viewing distance and assuming a standard monitor pixel size. This distance is then used to determine the maximum spatial frequency (in cpd) which the monitor can reproduce at each pixel (maxfreq). Next, the equation

  eyefreq = ((epsilon2 ./(alpha*(ec+epsilon2))).*log(1/CT0));
is used to determine the maximum spatial frequency eyefreq (in cpd) resolvable by the eye at each point in the image (assuming a known, fixed point of gaze). This equation is just a manipulation of equation (1), solved for the variable f. Finally, the maximum frequency of the monitor is divided by the maximum frequency resolvable by the visual system at each pixel:
  pyrlevel = maxfreq ./ eyefreq;
This result, pyrlevel, corresponds to the (fractional) pyramid level which is required to be sent at each point in the image.

The pyrlevel matrix may contain values less than 1. This occurs when the maximum spatial resolution resolvable by the eye is greater than the maximum resolution which the screen can display. Since the display is a limiting factor, the pyrlevel matrix must be truncated at 1.0, which corresponds to the highest-resolution pyramid level. Conversely, the pyrlevel matrix may contain values which are larger than the highest pyramid level which was computed. In this case, pyrlevel is again truncated in order to stay within the bounds of the computed pyramid.

This will produce a matrix containing floating-point values between 1 and the number of pyramid levels which have been computed. In order to use this matrix, each pyramid level is blurred and upsampled in order to make each pyramid image the same size as the original. These images form a 3-D dataset. The value at each point in the foveated image is computed by linear interpolation between images in the 3-D dataset based on the floating-point pyrlevel value.

In this manner, a foveated image is computed without the abrupt transitions between pyramid levels which are prevalent in the literature. In fairness, this technique should not be used in applications where computational speed is a priority. Since a 3-D interpolation is required at each pixel, the required computation is nontrivial. However, in cases where remote transmission is a severely limiting factor, substantial processing time may be available for computation at the receiving processor as it waits for new data.

Color fidelity is maintained by the use of the YCbCr transformation. When an RGB image is read in, it is converted to the YCbCr color space in order to separate luminance information from chrominance information. As long as each channel is treated identically, it should not make a great deal of difference which values are used during foveation. However, if it is desired that chrominance information be foveated to a greater degree than luminance information, then the YCbCr color space would be the preferred choice.

Compression through Foveation

In order to use the foveation framework described above for compression, a standard framework for transmitting the necessary data must be conceived. Specifically, only those portions of the multiresolution pyramid which are absolutely necessary should be transmitted. Luckily, we have already computed the pyrlevel structure, which can be used to determine the pixels which are necessary for linear interpolation. In order for the receiver to be able to compute its own pyrlevel structure for reconstruction, the x- and y-coordinates of the foveation point must be transmitted, as well as the nominal viewing distance and size of the original image. With only this small amount of information, the pyrlevel structure should be uniquely determined.

Next, the transmitter should send each level of the pyramid, taking care not to transmit unnecessary information. Therefore, pyrlevel is checked for each pixel in each pyramid level. If the pyrlevel value at the appropriate pixel location is large enough that the pixel in question will not be required for the linear interpolation, then that pixel in the pyramid will not be transmitted. Since the receiver also has a copy of the pyrlevel matrix, it can also determine which pixels should be sent. Therefore, the transmitter and receiver can remain in synchrony without extra bandwidth used for defining which pixels are to be sent.

In addition to the pyramid encoding, the image data should be compressed before transmission using some linear encoding mechanism such as Huffman coding. Using a linear code, the receiver can unpack the image data progressively as it is received, without having to wait for the entirety of the data to be received. This, in conjunction with encoding using a Laplacian pyramid, should produce substantially compressed images.

If the data is sent starting at the lowest-resolution level, then the receiving computer can progressively render the image as it is received. This will allow the user to monitor the progress of the download, as well as allowing the user to see gross image attributes before all of the data has been transmitted.

Fovea-First Transmission

An extension of the above compression algorithm would involve transmitting the remaining portions of the multiresolution pyramid only after the foveated pyramid has been transmitted. In this way, the important details of the image can be transmitted more rapidly, with the remaining full-resolution image data transmitted afterwards.

Using the Laplacian pyramid, this approach amounts to simply rearranging the order of data transmission with respect to a "standard" transmission in which each pyramid level is transmitted sequentially. In this case, the algorithms described in Compression through Foveation above can be used to transmit the foveated image rapidly. Then, after the foveated image is completed, the image server can revert to the lowest-resolution pyramid levels and transmit those data which were not transmitted in the first pass. This action can be followed by the receiver, which incrementally updates the received image. When all of the data have been transmitted, the image is rendered at full resolution. The difference in filesizes between this algorithm and a standard Laplacian pyramid encoding are negligible; the data have merely been sent in a different order. The only additional information required are those few values which specify the fovea location and nominal viewing distance.

Bill Overall / March 12, 1999