
Encoding Method
Of the encoding approaches described in the literature, the
multiresolution methods appear to have the most flexibility. In
particular, the Laplacian and Gaussian multiresolution pyramids have
the advantage of being somewhat intuitive, while
allowing for an arbitrary spatial resolution weighting function.
Therefore, this study uses Laplacian and Gaussian multiresolution
pyramids as the foundation for foveating images.
In order to generate foveated images, an original uniform-resolution
image is decomposed into a Gaussian pyramid of several levels. The
levels of this pyramid are numbered, where level 1 represents the
full-sized image, and each smaller-sized image occupies a
higher-numbered level. The Gaussian pyramid is chosen because
blurred versions of the original image will be required for
certain portions of the foveation process. Before transmission, it is
assumed that the appropriate subtractions will occur in order to
generate a Laplacian image pyramid for improved compressibility.
Next, a weighting function is computed using equation (1) above. This
is done in several steps. First, the distance from the observer to
each point on the image is calculated, using a nominal value for
viewing distance and assuming a standard monitor pixel size. This
distance is then used to determine the maximum spatial frequency (in
cpd) which the monitor can reproduce at each pixel (maxfreq).
Next, the equation
The pyrlevel matrix may contain values less than 1.
This occurs when the maximum spatial resolution resolvable by the eye
is greater than the maximum resolution which the screen can display.
Since the display is a limiting factor, the pyrlevel matrix must be
truncated at 1.0, which corresponds to the highest-resolution pyramid
level. Conversely, the pyrlevel matrix may contain values
which are larger than the highest pyramid level which was computed.
In this case, pyrlevel is again truncated in order to stay
within the bounds of the computed pyramid.
This will produce a matrix containing floating-point values between 1
and the number of pyramid levels which have been computed. In order
to use this matrix, each pyramid level is blurred and upsampled in
order to make each pyramid image the same size as the original. These
images form a 3-D dataset. The value at each point in the foveated image is
computed by linear interpolation between images in the 3-D dataset
based on the floating-point pyrlevel value.
In this manner, a foveated image is computed without the abrupt
transitions between pyramid levels which are prevalent in the
literature. In fairness, this technique should not be used in
applications where computational speed is a priority. Since a 3-D
interpolation is required at each pixel, the required computation is
nontrivial. However, in cases where remote transmission is a severely
limiting factor, substantial processing time may be available for
computation at the receiving processor as it waits for new data.
Color fidelity is maintained by the use of the YCbCr transformation.
When an RGB image is read in, it is converted to the YCbCr color space
in order to separate luminance information from chrominance
information. As long as each channel is treated identically, it
should not make a great deal of difference which values are used
during foveation. However, if it is desired that chrominance
information be foveated to a greater degree than luminance
information, then the YCbCr color space would be the preferred
choice.
Compression through Foveation
In order to use the foveation framework described above for
compression, a standard framework for transmitting the necessary data
must be conceived. Specifically, only those portions of the
multiresolution pyramid which are absolutely necessary should be
transmitted. Luckily, we have already computed the pyrlevel
structure, which can be used to determine the pixels which are
necessary for linear interpolation. In order for the receiver to be able to
compute its own pyrlevel structure for reconstruction, the x- and
y-coordinates of the foveation point must be transmitted, as well as
the nominal viewing distance and size of the original image. With
only this small amount of information, the pyrlevel structure should
be uniquely determined.
Next, the transmitter should send each level of the pyramid, taking
care not to transmit unnecessary information. Therefore, pyrlevel
is checked for each pixel in each pyramid level. If the pyrlevel
value at the appropriate pixel location is large enough that the pixel
in question will not be required for the linear interpolation, then
that pixel in the pyramid will not be transmitted. Since the receiver
also has a copy of the pyrlevel matrix, it can also determine which
pixels should be sent. Therefore, the transmitter and receiver can
remain in synchrony without extra bandwidth used for defining which
pixels are to be sent.
In addition to the pyramid encoding, the image data should be
compressed before transmission using some linear encoding mechanism
such as Huffman coding. Using a linear code, the receiver can unpack
the image data progressively as it is received, without having to wait
for the entirety of the data to be received. This, in conjunction
with encoding using a Laplacian pyramid, should produce substantially
compressed images.
If the data is sent starting at the lowest-resolution level, then the
receiving computer can progressively render the image as it is
received. This will allow the user to monitor the progress of the
download, as well as allowing the user to see gross image attributes
before all of the data has been transmitted.
Fovea-First Transmission
An extension of the above compression algorithm would involve
transmitting the remaining portions of the multiresolution pyramid
only after the foveated pyramid has been transmitted. In this way,
the important details of the image can be transmitted more rapidly,
with the remaining full-resolution image data transmitted afterwards.
Using the Laplacian pyramid, this approach amounts to simply
rearranging the order of data transmission with respect to a
"standard" transmission in which each pyramid level is transmitted
sequentially. In this case, the algorithms described in Compression
through Foveation above can be used to transmit the foveated image
rapidly. Then, after the foveated image is completed, the image server
can revert to the lowest-resolution pyramid levels and transmit those
data which were not transmitted in the first pass. This action can be
followed by the receiver, which incrementally updates the received
image. When all of the data have been transmitted, the image is
rendered at full resolution. The difference in filesizes between this
algorithm and a standard Laplacian pyramid encoding are negligible;
the data have merely been sent in a different order. The only
additional information required are those few values which specify the
fovea location and nominal viewing distance.
eyefreq = ((epsilon2 ./(alpha*(ec+epsilon2))).*log(1/CT0));
is used to determine the maximum spatial frequency eyefreq (in cpd)
resolvable by the eye at each point in the image (assuming a known,
fixed point of gaze). This equation is just a manipulation of
equation (1), solved for the variable f. Finally, the maximum
frequency of the monitor is divided by the maximum frequency
resolvable by the visual system at each pixel:
pyrlevel = maxfreq ./ eyefreq;
This result, pyrlevel, corresponds to the (fractional) pyramid level
which is required to be sent at each point in the image.