Color Printer Calibration

Dion Monstavicius
Benjamin Olding
Brian Sheldon

Introduction

This project intended to calibrate the color gamut of the HP Designjet 3500 CP. The goal was to accurately reproduce the colors of a calibrated image. All colors lying outside the printer gamut were to be "clipped" to the edge of the gamut.

The Project Vision

We would like a "black box" that can perceptually reproduce a given image on the HP Deskjet 3500 CP. Ideally, the input would be given in perceptual space. For our project we chose xyz space, but CIELAB space could be chosen as well (as long as a white point is suitably defined).

Background

There are two basic approaches taken to calibrating color printers. One is to build up a physical model of the printing system. This involves a knowledge of the physics and chemistry behind the inks and paper used in the process. The second is to build an empirical model of the printer. This approach relies on the measurement of a large number of color samples of known CMYK value.

This data is then used either to build a set of linear equations used for regression analysis or to create a lookup-table for three dimensional interpolation.

Regression models have apparently not been used with much success[1]. However, several authors - including Hung[1] and Schmitt and Hardeberg[2] - have successfully calibrated printers using the lookup-table approach. Two students from the 1999 class of EE362 attempted to simulate a similar approach[3]. We have attempted to fully implement such a scheme.

Our Methods

Sample printing

The printer we wished to calibrate was the large HP Designjet 3500CP in the ISE lab:

Additional information about the printer can be found on the HP 3500CP web page.

Our first step towards calibration was to print test patches of color and measure their xyz values. In order to consistently specify the test colors, we needed to control each ink in the printer individually. We accomplished this by creating PostScript files which print specified values of CMYK to the printer. Rather than directly write PostScript files[4], we created MATLAB functions to generate PostScript code.

The heirarchy of code is as follows:

testpage.m

Make test page of squares: makepage.m

Initialize PostScript file: initps.m

Place a patch of color on page: addcell.m

Define shape of a patch box: defbox.m

Add a line of postscript code to PS file: al.m

Set postscript font: setfont.m

Put title on test page: pagetitle.m

The result was a series of test pages of different patches of color.

Each test patch is uniquely identified by three intensity values of cyan, magenta, and yellow. The printer, of course, can also print out black ink. Since color space is adequately specified with three values, we used a set formula to determine the amount of black ink to add:

1) Let K equal the smallest value of C,M, and Y.
2) Let C=(C-K)/(1-K), M=(M-K)/(1-K), and Y=(Y-K)/(1-K).

This formula was suggested by Michael Schindler[5]. It substitutes black ink when all inks are printed, and then "boosts" the intensity levels of the non-zeroed color inks. This is not the standard conversion (which does not involve dividing by (1-K).) For our purposes, however, it does not truly matter what formula we use to derive the amount of black ink used. It is only important that we be able to use the full color gamut of the printer, and consistently specify the value of black the same way each time.

We then printed over 1100 patches of known CMYK values. Of these points, approximately 700 points were taken on the edge of the color gamut. All of the PostScript test pages can be found in our test page directory.

The xyz value of each patch was measured under a constant illuminant using a "SpectraScan 650". Our illuminant is shown below, along with the spectral range of the SpectraScan 650:

Our apparatus set-up is depicted here:

Once a test page of data was measured with the spectroradiometer, we converted each patch spectrum into an xyz value and merged it with the known values of CMY space using the function mergecmyxyz.m. We also saved the original spectrum data. This data inherently contains the reflectance of the ink. Thus - if we desired - we could predict what the inks would look like under a different illuminant. All collected data can be found in our data directory.

Figure 1: Patch data in CMY and xyz Space

XYZ to CMY

Inside the color gamut

When the requested xyz value falls within the color gamut, we must attempt to recreate the xyz value using our known data set of xyz-CMY values. Since the requested value can be from a continuous range of values, and we only have a finite number of points which directly map to CMY space, a 3D interpolation scheme must be implemented.

In three dimensions, the fewest number of points that can enclose a space are the four vertices of a tetrahedron. Our algorithm thus finds the tetrahedron surrounding the given xyz coordinate and uses the CMY values of the vertices to interpolate the coordinate's CMY value. An interior point of a tetrahedron can be used to break the tetrahedron into a subset of four smaller tetrahedrons. The volumes of these subsequent tetrahedrons serve as the weights of the different CMY values of the vertices. For example, the weight w given to the CMY values of the first vertex is:

w₁ = Volume(P₀P₂P₃ P₄) / Volume(P₁P₂P₃ P₄)
where P₀ is the xyz coordinate whose CMY value is to be determined and P₁ through P₄ are the four points of known data that make up the surrounding tetrahedron.

This procedure of weighting the vertices of the surrounding tetrahedron in xyz space seems fairly standard. See, for example, the work of Francis Schmitt and Jon Yngve Hardeberg[2] or last year's project by Suiqiang Deng and Feng Xiao[3].

Unlike the algorithms used in the 3D interpolation schemes of the above references, however, our algorithm does not attempt to pre-compute which points serve as vertices on a "surrounding tetrahedron". We see several advantages to this. First, the algorithm can go straight to the relevant data points by sorting through a simple list of data, rather than try to transverse a structure of adjacent tetrahedrons. Second, the "best" tetrahedron is always selected as the surrounding tetrahedron. A mere eight points can be used to generate a total of sixty-four tetrahedrons. Our algorithm guarantees that the most appropriate tetrahedron (i.e. surrounding tetrahedron with the smallest volume) is always selected. This will always be true no matter what data is collected.

The algorithm works as follows:

(1) Use given xyz point to divide the data space into the eight octants of Cartesian space. (Imagine the xyz point sitting at the origin.) Find the closest data point in each of the surrounding octants. This guarantees that the xyz coordinate is in fact in the color gamut.
(2) Generate the 64 possible tetrahedrons that can be constructed from these surrounding points. Order in terms of volume from smallest to largest.
(3) Use the first tetrahedron on the list that contains the original xyz point.

From step (1) of the above algorithm, it is clear that it is extremely important to probe where the edges of the color gamut lie (which is why the majority of our data set consists of edge points).

The heirarchy of code is as follows:

xyz2cmy.m

Find surrounding tetrahedron: findtetra.m

Check for 4 points in same half-space: coplanar.m

Find volume of tetrahedron: tetravolume.m

Outside the color gamut

When the requested xyz value falls outside the color gamut, this means the printer cannot possibly exactly recreate the requested color. Our algorithm attempts fo find the closest point on the edge of the color gamut and use that value for the outlying point.

We took a "best effort" approach to finding the nearest edge point. The algorithm goes through the following steps to try to find the nearest xyz coordinate in the printer gamut:

(1) Find the 3 closest points to the xyz coordinate outside the gamut.
(2) These three points form a plane; project the xyz coordinate onto the plane using the plane's normal.
(3) If the projected point lies within the triangle formed by the three closest points, interpolate the projected point's CMY value using the vertices of the triangle.
(4) If the projected point is not surrounded by the triangle, start over using the two closest points.
(5) These two points form a line; project the xyz coordinate onto the line using an orthagonal line.
(6) If the projected point lies in between the line segment formed by the two closest points, interpolate the projected point's CMY value using the endpoints of the line segment.
(7) If the projected point lies outside the line segment, use the CMY values of the nearest xyz data point.

Again, it should be clear that for this type of algorithm to be effective, the edges of the color gamut must be well defined.

The heirarchy of code for this part of the algorithm is as follows:

xyz2cmy.m

Find area of triangle: trianglearea.m

Augmenting the data set

For a 3D lookup table to be most effective, we want all the xyz data to be equally spaced. Since the relationship between CMY and xyz space is non-linear, this would mean unequal spacing in CMY space. However, one cannot determine how unequal this spacing should be without a function that converts xyz space to CMY space, which - of course - is the entire purpose of the project.

We thus began investigating xyz space using regularly spaced values of CMY space. As shown in figure 1, this regular sampling of CMY space resulted in rather irregular sampling of xyz space. Once we have a rough idea of what xyz space looks like, however, we can begin probing those areas which are undersampled. We used MATLAB code to automatically generate new test patches to measure. The algorithm is as follows:

(1) Randomly generate a large number of xyz points contained in the color gamut.
(2) Find the volumes of the tetrahedrons that surround each point.
(3) Use the points corresponding to the tetrahedrons with the largest volumes (these will result in the worst approximations.) We used the largest 10%.
(4) Using the above algorithm convert the xyz points into CMY space.
(5) Print out the CMY values, measure the true xyz values, and merge the values with the exisiting data set.

The heirarchy of code is as follows:

testpage.m

Automatically generate new data: generatedata.m

Find surrounding tetrahedron: findtetra.m

Convert xyz to CMY: xyz2cmy.m

This approach also provided a very convenient way to determine how accurate our algorithm was. In essence, this method seeks out and selects the xyz points the program should be worst at predicting. By comparing these predictions to the actual measured xyz values, we were able to determine our error in the worst-case situations.

Results

Using the method above, we simlulated the places where our error should be the worst. We found 5 xyz points furthest from neighboring xyz points within the lookup table. We then ran our xyz2cmy algorithm on these points and printed the results. Using the spectroradiometer, we measured the actual xyz values of the patches that came out of the printer. The results are plotted in CIELAB space using the plain paper reading as the white point:

As you can see, the delta Eab we measured from our input values is less than 5. Furthermore, the delta Eab seemed to be decreasing with more points, as we might expect with our "find the farthest point" search method. Encouraged by the above data, we added these points to our lookup table and searched for the next 10 farthest points. The discrepency is plotted below:

This data is less encouraging. It seems to suggest no trend in delta Eab. Furthermore, a large spike near 12 delta Eab reduced our confidence in a perceptually accurate algorithm. However, our code is optimized in xyz space, not CIELAB space. It is entirely possible that a smaller xyz difference can result in a larger CIELAB difference. Due to time (and missing spectroradiometer) constraints, we could not take a large data set to more accurately determine our error. For the 15 points we did take, we found an average error of approximately 4 delta Eab units, with a maximum delta Eab error of 12.

We did not attempt to calculate error of outlying points. For outliers, it is hard to determine how well the algorithm works because we never intended to recreate xyz outside the gamut. This measurement could never discriminate algorithm error from outlier distance.

Functionally, we intended our project to take an image of xyz space, convert it to CMY space, and print to the printer. Therefore, we created two additional MATLAB m-files which can convert an entire image of xyz points to CMY space and use it to create a PostScript file.

The heirarchy of code is:

convertxyzimage.m

xyz to CMY function: xyz2cmy.m

Print CMYK image as a PostScript: printcmyk.m

Conclusions

We created a large set of CMY to XYZ calibration data. We believe our XYZ to CMYK 3D lookup and interpolation scheme is a valid "first pass" at perceptual recreation of data on the HP 3500 CP, especially when the required values fall within the color gamut of the printer. A more sophisticated model may be required if many points will fall outside the color gamut of the printer, such as an algorithm that attempts to squeeze outlying points into the color gamut while displacing internal points farther inward.

Post-Presentation Notes

We presented this project to the class on March 10, 2000. Amran Silverstein pointed out to us that volume is not an appropriate variable to determine potential error. A point in xyz space that is surrounded by a very large tetrahedron may still be interpolated accurately if it is very close to a vertex of the tetrahedron. Thus, the most appropriate variable to search for is "distance from nearest vertex."

References

[1] Hung, Po-Chieh. Colorimetric Calibration in Electronic Imaging Devices Using a Look-Up-Table Method and Interpolations. Journal of Electronic Imagining, 2(1):53-61, 1993.

[2] Schmitt, Francis and Jon Yngve Hardeberg. Color Printer Characterization by 3D Triangulation.

[3] Deng, Suiqiang, and Feng Xiao. Mini Color Management System, EE 362. 1999.

[4] Weingartner, Peter. A First Guide to PostScript.

[5] Schindler, Michael. Things You Should Know About Color Printing.

Appendix

This project was coded entirely in MATLAB. All the m-files are listed in our code directory.

All the collected data was stored in mat-files. They can be found in our data directory.

All the test pages sent to the printer were processed as a PostScript file. These files have been stored in our test page directory.