EE362 - JPEG 200 and Wavelet Compression. S. Choo and G. Chew

JPEG 2000 and Wavelet Compression

Shuo-yen Choo and Gregory Chew

Motivation:

JPEG is the most widely used, general purpose image format. JPEG 2000 is the widely touted successor to JPEG.
Key feature: Perceptually higher quality images at lower bitrates using wavelet compression. There are proprietary wavelet based formats, but JPEG2000 will be the first open standard that uses this.
And other cool features.
So it's worthwhile to try to understand how JPEG2000 works, and the gains to be made from moving to multi-resolution, wavelet based compression.

Part 1. We'll give an overview of JPEG2000.

Part 2. We'll present experimental data comparing the performance of DCT with the performance of various wavelet-based transforms.

Part 1. JPEG 2000: Under the Hood

Image components and tiling.

JPEG 2000 supports multiple image components: 1 to 255 (or more), Bit depths 1 to 32bits. Allows multi spectral imaging. JPEG2000 supports both lossy and lossless compression. [1]

Each Component can have different sizes and bit depths, and have different alignments relative to each other. [2]

Figure 1. Position of the image component relative to the reference grid. [2]

Each image component is further broken down into Tiles. Tile sizes are variable, and can differ from component to component. Similar to blocks in JPEG, but more flexible.

Figure 2. Image tiles [3].

Wavelet compression

JPEG2000 uses wavelet transforms in the lossy stage of image compression. Wavelet transforms break down the image into multi resolution representations.

For JPEG2000, the wavelet transform is applied to the image on a tile by tile basis. We will look at wavelet compression in more detail in the second part of this presentation.

But for now, an overview of how it works:

In the one dimensional case, the signal is broken into subbands by passing it through a low pass filter and a high pass filter, and both subbands are downsampled by 2. The same procedure can then be applied iteratively to the low frequency subband, and repeated for as many levels of decomposition as desired. If the filters used satisfy certain properties, the original signal can be reconstructed by reversing the procedure.

Figure 3. 2 level subband decomposition and synthesis. [4]

In the 2 dimensional case, the decomposition is applied separably in the horizontal and vertical directions. This leads to a 2 dimensional signal getting broken down into four subbands, known as "LL", "LH", "HL", and "HH". Conceptually, for a particular image, they translate to a low-frequency approximation of the original, primarily horizontal edges, primarily vertical edges, and diagonal edges. (other decompositions are supported. See [10])

Figure 4. 2 Level Decomposition.

A quantization matrix is then applied to the decomposed image. Uniform quantization is performed within each subband, with different levels of quantization for each subband. Generally, we want to quantize the higher frequency subbands more coarsely, since humans have lower contrast sensitivity to high frequency information.

Figure 5. 2 level decomposition of baboon with Daubechies 4 wavelet (with false color for visibility)

It has been found that wavelet representations of an image generally perform than DCT representations for lossy image compression, as there is less perceptual loss for the same bit rate. This is the case even when performed on the same block size.

It is believed that multi-resolution wavelet representations give better performance because:

Multi-resolution representations are more similar to how the human visual system represents images [5]. Consequently better quantization matrices can be chosen, to more closely match and exploit the characteristics of the human visual system
The wavelet basis functions are smoother than the DCT basis functions (which tend to be blocky), and are more natural and pleasing to the eye [6]

We will investigate this further in the second part of our report.

Region of Interest (ROI) coding

In ROI coding, portions of an image are stored at higher quality than the rest of the image. This is useful, because we may care more about detail in some portions of an image than in others. e.g. We've all experienced the unreadable text in the Stanford online videos. Similar applications exist in medical imaging, etc.

Figure 6. An example of ROI coding with a rectangular ROI mask [7].

ROI is easy to do when the image is stored compressed in a multi resolution format.

1. We first start with a ROI mask, which marks out a region of the image we wish to store at higher quality.

Figure 7. ROI mask [8]

2. The wavelet coefficients corresponding to the transform of the mask have to be stored at higher quality (quantized less coarsely). We can do this by applying the transform to the mask, and looking at which coefficients fall in the mask.

Figure 8. Transformed ROI mask [8]

Note that mask information is not needed at the decoder.

There can be more than one ROI in the image. Note that disjoint masks can overlap in the wavelet domain due to filtering. When this occurs, the region of overlap can be stored at the quality of the mask with the highest quality [9].

Progressive transmission

Lower resolution coefficients of the multi-resolution decomposition can be transmitted first. This allows for progressive transmission and display of the image.

The decoder can display the image progressively by resolution (the image gets larger as more information is received), or progressively by quality.

Figure 9. Progressively by resolution [11].

Figure 10. Progressively by quality. L: 0.0625 bpp , R: 0.5 bpp[12]

Because of image tiling, coefficients from different tiles have to be gathered, so that the lower resolution coefficients are sent first.

Other forms of progression are possible, such as progression by image channel. [13]

Compressed domain image manipulation

Basic geometrical transformations can be applied (easily) on the compressed representation of the image. This eliminates the need to decompress and recompress the image for transformation. e.g. vertical and horizontal flipping, rotation by multiples of 90 degrees.

Figure 11. Vertical Flipping [14].

Figure 12. Rotation [15]

Part 2. Wavelet Compression

We compared the quality of JPEG compressed images against the quality of images compressed with a variety of wavelet filters, in terms of the SNR and the subjective image quality.

We looked at 3 important classes of images: 4 natural images, 3 synthetic images and 4 textual images were used. The images were all 256 by 256 in size.

Natural images

Figure 1. L to R, Top to Bottom: Lena, Barabara, Baboon, Einstein

Synthetic images

Figure 2. L to R, Top to bottom: Sinusoid 1 (1 cycle every 100 pixels), Sinusoid 2 (5 cycles every 100 pixes), Checker pattern, Square.

Text:

Figure 3. L to R, Top to bottom: Text 2, Text 3, Text 4

The filters used were Daubechies 1,2 4, 5, 8 and Symlets 2, 4 ,5, 8. The Daubechies filters are popular in image processing. Symlets are less frequently seen, but surprisingly perform well (see later).

For purpose of comparison with JPEG, the wavelet filters were applied on 8 by 8 blocks.

Procedure:

1. A JPEG Quality factor is selected, and the bit rate is calculated from the the quantization matrix returned. The JPEG Quality factors used were 10, 15, 20, 25 and 30. Low quality factors were chosen, because artifacts are easily perceptible only at low quality levels.

2. An appropriate quantization matrix with the same bit rate is selected, and applied to the wavelet transform coefficients.

3. The inverse transforms are performed. SNR and subjective image quality are compared.

Quantization

JPEG2000 does not specify the use of particular quantization matrices. A way of calculating a quantization matrix for a particular filter is suggested, but the formula given seems fairly arbitrary (rather than being based on empirical experimental data) [16]

In "Visibility of wavelet quantization noise" [4], the authors perform experiments with human subjects to determine the perceptually lossless quantization matrix for the 9/7 biorthogonal filter.

However, in general, there does not seem to be much data on what the perceptually optimal quantization matrix for a given filter is. Therefore, we have elected for simplicity, to use a single quantization matrix for all the wavelet filters in this experiment (although this limits the validity of comparison somewhat).

The Quantization matrix used for all the wavelet filters was

     8     7     8     8    34    34    34    34
     7     7     8     8    34    34    34    34
     8     8    12    12   34    34    34    34
     8     8    12    12   34    34    34    34
    34    34   34    34   55    55    55    55
    34    34   34    34   55    55    55    55
    34    34   34    34   55    55    55    55
    34    34   34    34   55    55    55    55

scaled to obtain a particular bitrate. This was the quantization matrix used in [17].

Results

Detailed results are available in the Matlab .mat files and as image files. Please refer to the Appendix.

Click here to plots of the Average SNR against the Bit Rate
(It may be hard to read the plots from the gifs. Load the .mat file and use makeplots.m to plot the snr data.)

We present a summary of the results obtained here:

In all cases, for the same bit rate, the wavelet filters produced images with significantly higher SNR and perceptual image quality than JPEG.
There was one exception - in the high frequency sinusoid (sinusoid 2), the SNR for JPEG started low, but rapidly increases with bitrate and eventually overtakes the wavelets transforms. This may be because the sinusoid chosen matches a DCT basis function, and as the quantization quality increases, the SNR increases dramatically with the coefficient accuracy.
In terms of SNR, different wavelet filters performed differently for different images, but generally the difference between wavelet filters was not great.
For natural images, wavelets showed an improvement of around 3 to 6 db over jpeg. Symlet 5 generally performed best, or close to the best for natural images. For synthetic and textual images, the SNR improvement for wavelet filters was even more dramatic (in the 5 to 10 db range), and db1 generally performed best.
For several of the synthetic and textual images (square, checker, text1 and text3), the SNR obtained by db1 was extremely high and non-monotonic with the bitrate (peaking at around 1.2 bpp). A possible explanation is that for some quantization matrix values, the wavelet coefficient is a multiple of the quantization scale factor (and this is more likely for short filters and synthetic images), but we're not too sure.

Sample images

Figure 4. Barbara JPEG. SNR 48.9, 1.76 bpp (Quality 30)

Figure 5. Barbara. db4. SNR 69.4, 1.76 bpp

Figure 6. Barbara. Symlet 5. SNR 79.4, 1.76 bpp

Figures 4 to 6 show the Barbara image at 1.76 bpp. Symlet 5 is the best performing wavelet transform (highest SNR) for this image, and db4 is the worst performing wavelet transform (lowest SNR). Both look better than the JPEG compressed image, and Symlet 5 looks better than db4.

The artifacts in wavelet compressed natural images tend appear as fine lines with some gradation of color within each block (even at low bit rates), as opposed to the severe blocking in JPEG compressed images.

Figure 7. Barbara. Symlet 5. SNR 44.9, 0.67 bpp

Figure 8. Barbara. db4. SNR 44.7, 0.91 bpp

Figures 7 and 8 show the Barbara image compressed by the symlet 5 and db4 filters. The SNR was made (nearly) equal in both the images, but the Symlet 5 image looks better. This shows that there's a perceptual difference between different wavelet filters, which cannot be characterized by the SNR alone.

Figure 9. Text4. JPEG. SNR 63.2. 1.76 bpp

Figure 10. Text 4. db 5. SNR 235.5. 1.76 bpp

Figure 9. Text 4. db 1. SNR 333.5. 1.76 bpp

Figures 9 to 11 show the Text4 image at 1.76 bpp. db1 is the best performing wavelet transform (highest SNR) for this image, and db5 is the worst performing wavelet transform (lowest SNR). The db1 compressed image looks very similar to the original (slightly blurred), while for the db5 compressed image, ripples at the text boundaries from the Gibbs windowing effect can be seen. In the jpeg compressed image, the rippling and blurring of the text is even more severe.

Conclusion and further comments.

Although our results are not conclusive (due to the fact the we used the same quantization matrix for all the wavelet filters), it is clear that using wavelet transforms provides significant SNR and perceptual image quality gains over the traditional DCT used in JPEG, especially at low bit rates. This agrees well with published results in the literature (such as in [17]). Adopting wavelet transforms for compression in JPEG2000 should result in a significant improvement in the image quality per bit.

Besides image quality improvements from moving to wavelet transforms, JPEG 2000 also offers increased flexibility that should make it more applicable than JPEG, and has other interesting feature like ROI coding and progressive transmission. Our description of JPEG2000 is by no means complete. Visit http://www.jpeg.org for more information.

Futher comments:

1.
We saw in our experiment that different wavelet filters produce results of different quality for different classes of images. For a particular image class, some filters generally perform better than others.

However, it does not seem well understood what the best wavelet transform to use for a given image type is. And for a given wavelet transform, what is the optimal quantization matrix?

In "Visibility of wavelet quantization noise" , Watson et. al. [4] develop a perceptually lossless quantization matrix for the 9/7 biorthogonal filter. The reasons they give for selecting this particular filter are that the filter is "i) linear phase ii) symmetrical iii) argued to have mathematical properties attractive for image compression iv) used by FBI for compression of fingerprint images."

These reasons seem less than compelling to us. What seems to be missing is an extensive body of experimental work with human subjects, describing the optimal (or at least well chosen) quantization matrices for different filters, and the subjective performance of the filters on a large bank of images of different types.

From our reading of the literature, at the present, the process of selection of the wavelet filter and quantization matrix to use seem very abritrary. More work will have to be done to try to determine the best wavelet filter and quantization matrix to use for a given class of images, and this is a possible future direction for this project.

2. An obvious extension would be to look at color as well as black and white images. We looked at b&w images only in this project for simplicity.

3.
JPEG 2000 is a very ambitious image standard which should provide much higher image quality and flexibility than JPEG. However, this comes at a cost of increased computational complexity. Wavelet transforms generally take longer to perform than the DCT. Implementation is also likely to be more complex.

We think it's an open question whether JPEG 2000 can become widely accepted, especially as a standard for Web imaging, since inertia from existing image formats is great. e.g. PNG has not yet caught on.

References

[1] JPEG2000 Tutorial. Christopoulos, Charilaos. Lecture given at IEEE Int. Conference on Image Processing (ICIP 99), in Kobe, Japan, 24-28 Oct 99. Page 18. http://etro.vub.ac.be/~chchrist/jpeg2000_contributions.htm

[2] JPEG2000 Committee draft version 1.0, 9th December 1999. Page 81. http://www.jpeg.org/CD15444-1.htm

[3] ibid. Page 82.

[4] Visibility of wavelet quantization noise. Watson, A.B.; Yang, G.Y.; Solomon, J.A.; Villasenor, J.
IEEE transactions on image processing, Vol. 6, No. 8, August 1997, Pages 1164 to 1175.

[5] Foundations of Vision. Wandell, Brian. Published by Sinauer Associates, 1995.

[6] Image compression using Wavelets. Yeung E. IEEE 1997 Canadian Conference on Electrical and Computer Engineering, 1997. Engineering Innovation: Voyage of Discovery, Vol. 1, Pages 241 - 244.

[7] JPEG2000 Tutorial. Page 103.

[8] JPEG2000 Tutorial. Page 92

[9] Region of interest coding in JPEG2000 for interactive client/server applications. Cruz, D.S.; Ebrahimi, T.; Larsson, M.; Askelof, J.; Cristopoulos, C. Multimedia Signal Processing, 1999 IEEE 3rd Workshop. Pages 389 - 394

[10] JPEG2000 Tutorial. Page 52.

[11] JPEG2000 Tutorial. Pages 63 and 64.

[12] JPEG2000 Tutorial. Pages 80 to 85.

[13] JPEG2000 Tutorial. Page 83.

[14] JPEG2000 Tutorial. Page 141.

[15] JPEG2000 Tutorial. Page 142

[16] JPEG2000 Committee draft version 1.0, 9th December 1999. Page 95.

[17] Wavelet transforms in a JPEG-like image decoder. de Queiroz, R.; Choi, C.K.; Huh, Y.; Rao, K.R. IEEE Transactions on Circuits and Systems for Video Technology, Vol 7, No. 2, April 1997, Pages 419 to 424.

The IEEE papers cited here are available at the IEEE online library at http://iel.ihs.com/

http://www.jpeg.org is a good starting point to learn more about jpeg.

An informative lecture on JPEG2000 given by Majid Rabbani, but not cited directly in our report, is available at http://foulard.ee.cornell.edu/hemami/Cornell_JPEG2K.PDF

Appendix

1. Plots of SNR vs bitrate.
2. Matlab .mat data files. These contain the data needed to generate the SNR vs bitrate plot. Load the appropriate .mat file, and run makeplots.m to generate the plot.
3. BMP Image files produced from the experiments in Part 2. The image files are named in this manner: originalimagefilename_filtertype_jpegqualityfactor_SNR_bpp.bmp
filtertype is "j" for jpeg, db1,db2,db4,db5,db8 for the Daubechies filter, and sym2, sym4, sym5, sym8 for the Symlets.
4. Matlab cod e used to perform the experiments. begin.m is the topmost file.