General
Information
Multiframe resolution enhancement, or super-resolution, seeks to obtain
a single high resolution image obtained from several low resolution images.
These images must, of course, be of the same object and must be taken from
slightly different angles.
There are different mathematical approaches to this problem, but all seek to address similar aspects of how to combine non redundant image information in multiple frames. Clearly, if successive frames are exact duplicates of one another, no new information is available, and no new information can be obtained. Successive images must be taken from slightly different perspectives, but not so much as to change the overall appearance of the object(s) in the image.
Terms used throughout this report are as follows:
LR = Low Resolution. Refers to the
multiple frames which are to be combined into a higher resolution image.
SR = Super Resolution. Refers to the
higher quality image obtained from the LR images.
Thinking about this on a more detailed level, one may ask how it is possible to combine images which are at different angles. Since images are based on units of pixels, we will have discrete steps in features such as object boundaries which will not be at the same pixel locations in successive frames. This however, is the underlying feature which enables us to extract more information than what is available in a single LR image.
The first step is to define a reference LR frame. This is typically the "first" frame in the sequence. This frame is expanded to the desired SR image resolution (e.g. a 150x150 pixel LR image is expanded to a 300x300 pixel image). The additional pixels can be inferred as a first guess by any one of several methods, the simplest of which is bilinear interpolation. The upsampled reference frame will simply be a blurred version of the original, but at the higher resolution of the desired SR image. The goal now is to use the information from the rest of the LR images to obtain a more accurate high resolution image.
Motion
Estimation
In order to "overlay" frames from different angles onto the reference
frame, there must be some way of transforming the LR pixels onto the SR
grid. This can be done in several different ways depending on the
type of motion that is present from frame to frame. The most common
methods are:
Affine
Projection
Quadratic Projection
We have used affine projection extensively throughout our project.
Super-resolution
Methods
The methods developed so far can be divided into three basic categories:
1) Frequency domain reconstruction
2) Iterative methods
3) Bayesian methods
Frequency
Domain Reconstruction
This method was first proposed by Tsai and Huang
[4]. Unlike all other methods, the data is first transformed to the
frequency domain where it is then combined. This data is then transformed
back into the spatial domain where the new image will have a higher resolution
than the original frames.
There is very little underlying "theory" behind this method other than basic frequency-domain manipulations and general Fourier Transform methods. We, however, can say with certainty, that the paper by Tsai and Huang, for all it does to further the notion of multiframe resolution enhancement, certainly manages to confuse the reader at every point by very inconsistent and generally obscure notation. One particularly irritating feature of this paper is that many integer range variables have no domain of definition (e.g. [0,N-1] or [1,N]). Once beyond that however, which is no trivial task, the methods are mostly straightforward from a computational standpoint.
There are two specific points that we should make note of here. The first is that this method assumes only translational motion between frames. The second is that this method also assumes that the image is bandlimited in the frequency domain (this will become apparent later)
Due to the obfuscation of the paper presented by Tsai and Huang, we will present a detailed description of the algorithm and the variables used therein.
For more detailed information on the method of Tsai and Huang, refer to:
Projection
Onto Convex Sets (POCS) Method (Iterative)
The POCS method was originally developed by Tekalp,
Ozkan, and Sezan [1].
For more detailed information on the POCS method, refer to:
Method
of Irani and Peleg (Iterative)
This iterative method proposed by Michal Irani and
Schmuel Peleg [8] falls into the class of iterative algorithms.
For more detailed information on the method of Irani and Peleg refer to:
Bayesian
Method
The Bayesian method we consider was developed by
Cheeseman [2] at NASA for SR reconstruction of planetary images.
The name comes from Bayes'
Theorem, which is as follows:
This method relies largely on the statistical knowledge that pixel to pixel differences are very small, and can be modeled with a probability distribution function.
In general, given a set of pixels from many frames which have been transformed (through an affine projection) to a common space, there will be a number of different images which are solutions as to the surface that could have produced the observed values. The Bayesian method seeks to find the solution possessing the maximum probability (i.e. the most likely surface given the observed values and the observation conditions)
For more detailed information on the Bayesian method of Cheeseman, refer to:
Conclusions
In this project we have attempted to present and
implement a significant (and hopefully comprehensive) number of the fundamental
algorithms on the subject of Multiframe Image Enhancement. In doing
so, we have succeeded in proving that both the POCS method and the method
of Irani and Peleg perform quite adequately in producing a SR image.
The questions still unanswered about the frequency-domain and the Bayesian
methods remain for different reasons. The Bayesian method we believe
to be both an issue of raw computing power and of the correct setting for
the implementation (i.e. C++ rather than Matlab.) The frequency-domain
method we strongly believe has only to do with the difficulty of understanding
notation in what has been presented by the original authors. In our
explanation of the method, unfortunately, there are still some technical
programming issues (with certain variables) which are not quite clear.
It is believed that we could succeed in figuring it our ourselves by either
delving much deeper into the theory behind the method (though we do go
quite far as can be seen in the description we present) or by finding other
authors who have presented the method in a more easily understood context.
The results we have obtained from POCS and from
Irani and Peleg's method are quite impressive and we do consider those
implementations to be a success.