Notes on PSF Matching

In practice, the steps for image matching are:

1) Determine the geometrical transform to register the images;
2) Apply this geometrical transform;
3) Determine the convolution kernal for PSF-matching from one or more sections;
4) Convolve the image with this (or a modified) kernal;
5) Determine the correct intensity transform (slope and offset);
6) Apply the intensity transform.

These notes discuss the third step.

(The following notation is used:

	lower case = function
	upper case = FT of function
	FT{ function } = FT of enclosed function
(so,

	f(x)  has a FT  F(s)
	F(s) = FT{ f(x) }

(Since the inverse FT is the same as the forward FT except for normalization,
for simplicity we will not make the distinction below; hence 

	f(x) = FT{ F(s) } = FT{ FT{ f(x) }}

(Convolution is represented by "*";  multiplication is implicit.

(Remember in the arguments which follow that the FT of a gaussian is also a
gaussian, and the broader the gaussian is in one space the narrower it will be
in the other.  The extreme cases are a delta function (or "point function")
transforming into a constant and vice versa.  Also, the step function and sinc 
function form another transform pair.

(Finally, I will often treat the problem as 1-dimensional for simplicity; the 
extension to 2-d is obvious.)


I.  Introduction

The PSF-matching problem can be simply stated as follows:  we have two
registered images (ie. all features are spatially matched) but because of
seeing variations and guiding errors, etc,  they will have different point
spread functions (PSFs). Let us assume, for simplicity, that one image (the
reference frame "r") has a broader PSF than the other (input frame "i").
We can write

	r = i*k

where "k" is some convolution kernal describing the difference in seeing/etc. 
We must solve for the unknown kernal "k".  From the convolution theorem, we can
express the problem using the FTs as

	R = IK
and so
	k = FT{ R/I }

Consider what we expect "k" to look like:  most PSFs have roughly gaussian
forms, and since the convolution of a gaussian with a gaussian is also a
gaussian, we expect "k" to be roughly gaussian.  This can be demonstated
directly in the formula above: recall that the FT of a gaussian is also
gaussian, as is a product (or ratio, provided the exponent remains negative) of two gaussians.

A difficulty arises because most real images have little power in
high-frequency components, whereas noise typically has a flat power spectrum. 
Thus, the high-frequency components will be dominated by noise, and when the
ratio of R and I is taken the high-frequency components in the ratio will be
poorly behaved.  The question is how to treat these unrealistic high-frequency 
components.

The standard approach is simply to suppress the high-frequency components.  In 
practice, once the noise starts dominating there is essentially no information 
to be had and it is best to simply filter out these components with a (modified)
step function.  However, using any filter introduces another problem, easily 
demonstrated.  Instead of 

	k = FT{ R/I }
we have
	k' = FT{ (R/I)F } = FT{ KF } = k*f

where F is the filter and "f" is its transform.  If F is a step function,
then k*f will be our desired kernal convolved with a sinc function of frequency
greater than 1 pixel, so the resulting kernal k' will be too broad.

Instead of filtering, consider replacing the high-frequency components with a
reasonable guess. If we assume that the kernal is gaussian, and hence 
its transform K is also gaussian,  we can model the low-frequency 
components of K (which are high signal-to-noise) with an elliptical 
gaussian and use this model to replace the high-frequency (low signal-to-noise) 
components.  Provided the residuals between the ideal case and our gaussian 
model are small, the deviation from the ideal kernal will be small.


II.  Phase Information

In order to avoid problems with shifts, etc. in the gaussian modelling above, 
it is necessary to work with the absolute value of K.  However, the phase 
information is present and should be preserved.  From the shift theorem, we 
have

	FT{ f(x-c) } = exp (i2pi cs) F(s)

(and so |FT{ f(x-c) }| = |F|, ie phase information is lost, as required for
the gaussian modelling); this gives

	Re (F) = cos (2pi cs) |F(s)|
	Im (F) = sin (2pi cs) |F(s)|

and we can solve for the unknown "c" via

	tan (2pi cs) = Im (F) / Re (F)

In our problem, the arbitrary function F above would be our K = R/I.


III.  Normalization

We have assumed so far that 

	r = i*k

Actually the real problem is

	r' = i'*k
where
	r' = r - br
	i' = i - bi

with "r" and "i" being the actual images and "br" and "bi" representing the
constant background levels.  Rewriting gives

	(r - br) = (i - bi)*k
or
	r = i*k + (br - bi*k)

It must be recognized that "k" does not necessarily have unit flux when
integrated over all space.  Let us restate the problem in perhaps its simplest
general form:

	r = (ai + b)*k = i*(ak) + b*k

where "k" is now a unit flux kernal, "a" and "b" are the slope and offset to 
a linear intensity tranform.  In this representation, "k" contains all
information about seeing/etc., "a" describes the exposure/transparency/etc.
ratios and "b" corrects for the sky background.  Furthermore, since "k" now
has unit flux, 

	b*k = b

and "b" and "k" are now independent.  This means that b does not affect the 
solution for "k"; conversely, "b" must be determined outside of the solution 
for "k".

The lowest-frequency component of the FT of a function is a real value equal 
to the integral of the function over all space.  In the discrete FT of an
image, this value (which I will refer to as the "central pixel" [since this is
where it appears in the power spectrum in PSFM]) is the sum of the values
over the image; hence, the central pixel of R and I are

	C(R) = sum (r') + sum (br) = sum (r') + nbr
	C(I) = sum (i') + sum (bi) = sum (i') + nbi

However, since we want "k" to be independent of the backround values, the
central pixel of K must have the value it would have in the case of no
background, and so instead of

	C(K) = C(R) / C(I) = sum (r) / sum (i)

which we get from K = R/I, we must set the central pixel of K to

	C(K) = sum (r') / sum (i')

Furthermore, since

	r' = i'*ak

and since "k" has unit flux, we find

	sum (r') = sum (i'*ak) = a sum (i')
so
	C(K) = sum (r') / sum (i') = a

Unless we have a priori knowledge of this intensity scale "a", we must rely
on modelling to estimate its value.  We may do this by modelling R and I 
independently, or by modelling K as we do for the replacement of high-
frequency components.  The first method is probably more valid under the 
right circumstances -- when "r" and "i" are essentially equivalent to the 
(nearly Gaussian) PSF; however, for arbitrary fields the FT of each may not 
be well represented by a Gaussian power spectrum.  This determination of "a" 
is perhaps potentially the most dangerous part of the solution of "k".

[NOTE: the task PSFM currently uses the first method, using the two pixels 
adjacent to the central pixel to determine a gaussian; this is done in x and 
y and the results averaged.  This approach is valid provided there is no 
structure of wavelength nx/2 to nx/3.  If there is, the choice of a Gaussian 
model over the first 2 pixels is clearly invalid; however it may still work]

Notice that when we solve for "k" via the ratio of the FTs that we actually 
get "ak": again, assuming no background (whose effect is confined solely to 
the central pixel) we have

	r = ai*k = i*ak
	ak = FT{ R/I }

Setting the central pixel C(K) to "a" is consistent with this.


IV.  Degradation Requirement

The solution for "k" is exact under the assumptions of the discrete FT, which 
is to say that if "i" is periodic beyond the edges of the image, convolving 
with "k" will produce "r".  However, this is not what we want (we already have 
"r"!).  What we generally want is to compare one or more sections of a large 
image to determine the effects of seeing/guiding/etc on the PSFs and to 
produce a finite kernal that degrades images in "i" to match those of "r". 
If we cannot degrade -- if image quality in "r" is better than in "i" -- we 
want no action. No action would be accomplished by a "k" consisting of a 
point function (ie the kernal has one non-zero pixel) of height "a".  The 
FT of such a function is simply a constant "a" (note: shifts are ignored here).

A common situation arises when it is not possible by means of a single 
degradation to match the PSFs in "i" and "r".  For example, suppose that the
"r" and "i" had equivalent seeing but the declination guiding error in "r" 
was larger than in "i".  Clearly, we would need to degrade "i" along the 
declination axis.  Now suppose that "i" had a larger RA guiding error than "r"; 
clearly we would have to degrade "r" along the RA axis -- we cannot improve
"i" along this axis.  However, during the solution for "k" the first time, the
values of K = R/I will be greater than "a" along the RA axis (since I will fall
to zero more rapidly than R in this direction); this would result in unwanted
(and unrealistic) action along the RA axis.  The solution is simply to limit
K to a maximum value of "a", with the result that "k" will be of unit width
along the RA axis.

Thus, K is always truncated at a maximum value of "a".


V.  Miscellaneous Comments

The convolution kernal will be approximately gaussian in shape, and may
contain a sinc component representing a shift.  Since convolution is a costly
operation, the smaller the kernal, the better.  Furthermore, as we get further
away from the central pixel of the kernal, non-zero values become less realistic
in terms of what we would expect from the effects of seeing and guiding. 
Also, during the convolution image defects such as cosmic rays will grow to the
size of the kernal; again, this argues for using as small a kernal as is
realistic (and for cleaning up CRs and defects before the convolution to
diminish their effect on nearby data).

One way of minimizing the kernal size is to make the image registration as 
accurate as possible prior to the PSF matching operation: this minimizes the 
sinc component, which decays relatively slowly.

It would be difficult to build an algorithm for the selection of kernal size. 
Parameters which are important are:

	a) how much flux is contained in the selected kernal (or how little
	would be thrown away or excluded by trimming the kernal);
	b) what is the maximum variation that would be excluded;
	c) what is the rms in the excluded regions.

The effect of noise will be to raise the rms over the entire kernal image. One
way to reduce this effect is to solve for the kernal in several sections and 
average the results.  In principle, several sections could be added together 
for better signal-to-noise and the sum input to the PSF matching task, but
a warning is called for: the broader the features become when summed (due to
different centering) the narrower the FTs will be and so the usable part of
the ratio may become smaller.

In any case, the larger the ratio of signal to noise in the sections, the 
better.  This means that small sections each just containing a bright star are 
preferable to large sections containing several stars and a lot of blank sky.
