Multiframe Resolution Enhancement
Obtaining Super-resolution Images from Low Resolution Sequences

Maher Khoury and Stephen Rose
Psychology 221/EE 362 Project




General Information
Multiframe resolution enhancement, or super-resolution, seeks to obtain a single high resolution image obtained from several low resolution images.  These images must, of course, be of the same object and must be taken from slightly different angles.

There are different mathematical approaches to this problem, but all seek to address similar aspects of how to combine non redundant image information in multiple frames.  Clearly, if successive frames are exact duplicates of one another, no new information is available, and no new information can be obtained.  Successive images must be taken from slightly different perspectives, but not so much as to change the overall appearance of the object(s) in the image.

Terms used throughout this report are as follows:
    LR = Low Resolution.  Refers to the multiple frames which are to be combined into a higher resolution image.
    SR = Super Resolution.  Refers to the higher quality image obtained from the LR images.

Thinking about this on a more detailed level, one may ask how it is possible to combine images which are at different angles.  Since images are based on units of pixels, we will have discrete steps in features such as object boundaries which will not be at the same pixel locations in successive frames.  This however, is the underlying feature which enables us to extract more information than what is available in a single LR image.

The first step is to define a reference LR frame.  This is typically the "first" frame in the sequence.  This frame is expanded to the desired SR image resolution (e.g. a 150x150 pixel LR image is expanded to a 300x300 pixel image).  The additional pixels can be inferred as a first guess by any one of several methods, the simplest of which is bilinear interpolation.  The upsampled reference frame will simply be a blurred version of the original, but at the higher resolution of the desired SR image.  The goal now is to use the information from the rest of the LR images to obtain a more accurate high resolution image.

Motion Estimation
In order to "overlay" frames from different angles onto the reference frame, there must be some way of transforming the LR pixels onto the SR grid.  This can be done in several different ways depending on the type of motion that is present from frame to frame.  The most common methods are:
        Affine Projection
        Quadratic Projection
We have used affine projection extensively throughout our project.

Super-resolution Methods
The methods developed so far can be divided into three basic categories:
    1)  Frequency domain reconstruction
    2)  Iterative methods
    3)  Bayesian methods

Frequency Domain Reconstruction
    This method was first proposed by Tsai and Huang [4].  Unlike all other methods, the data is first transformed to the frequency domain where it is then combined.  This data is then transformed back into the spatial domain where the new image will have a higher resolution than the original frames.

    There is very little underlying "theory" behind this method other than basic frequency-domain manipulations and general Fourier Transform methods.  We, however, can say with certainty, that the paper by Tsai and Huang, for all it does to further the notion of multiframe resolution enhancement, certainly manages to confuse the reader at every point by very inconsistent and generally obscure notation.  One particularly irritating feature of this paper is that many integer range variables have no domain of definition (e.g. [0,N-1] or [1,N]).  Once beyond that however, which is no trivial task, the methods are mostly straightforward from a computational standpoint.

    There are two specific points that we should make note of here.  The first is that this method assumes only translational motion between frames.  The second is that this method also assumes that the image is bandlimited in the frequency domain (this will become apparent later)

    Due to the obfuscation of the paper presented by Tsai and Huang, we will present a detailed description of the algorithm and the variables used therein.

For more detailed information on the method of Tsai and Huang, refer to:


Projection Onto Convex Sets (POCS) Method (Iterative)
    The POCS method was originally developed by Tekalp, Ozkan, and Sezan [1].

For more detailed information on the POCS method, refer to:


Method of Irani and Peleg (Iterative)
    This iterative method proposed by Michal Irani and Schmuel Peleg [8] falls into the class of iterative algorithms.

For more detailed information on the method of Irani and Peleg refer to:


Bayesian Method
    The Bayesian method we consider was developed by Cheeseman [2] at NASA for SR reconstruction of planetary images.  The name comes from Bayes' Theorem, which is as follows:

This theorem states that the probability of B given A can be found if we know the probability of A given B and the probabilities of all A and all B separately.  In images, what we know is what the camera receives based on the lighting, camera angle, camera sensor resolution, and other physical parameters.  In rendering computer graphics, we know the perfect surface and the physical conditions, and so we can construct easily what a camera would "see".  The problem of super-resolution is the inverse of this.  We know what the camera sees and we want to find the perfect surface.

This method relies largely on the statistical knowledge that pixel to pixel differences are very small, and can be modeled with a probability distribution function.

In general, given a set of pixels from many frames which have been transformed (through an affine projection) to a common space, there will be a number of different images which are solutions as to the surface that could have produced the observed values.  The Bayesian method seeks to find the solution possessing the maximum probability (i.e. the most likely surface given the observed values and the observation conditions)

For more detailed information on the Bayesian method of Cheeseman, refer to:


Conclusions
    In this project we have attempted to present and implement a significant (and hopefully comprehensive) number of the fundamental algorithms on the subject of Multiframe Image Enhancement.  In doing so, we have succeeded in proving that both the POCS method and the method of Irani and Peleg perform quite adequately in producing a SR image.  The questions still unanswered about the frequency-domain and the Bayesian methods remain for different reasons.  The Bayesian method we believe to be both an issue of raw computing power and of the correct setting for the implementation (i.e. C++ rather than Matlab.)  The frequency-domain method we strongly believe has only to do with the difficulty of understanding notation in what has been presented by the original authors.  In our explanation of the method, unfortunately, there are still some technical programming issues (with certain variables) which are not quite clear.  It is believed that we could succeed in figuring it our ourselves by either delving much deeper into the theory behind the method (though we do go quite far as can be seen in the description we present) or by finding other authors who have presented the method in a more easily understood context.
    The results we have obtained from POCS and from Irani and Peleg's method are quite impressive and we do consider those implementations to be a success.

References