Post on 13-Jun-2020
Morning Section: Introductory Material
Building Your Own Wavelets at Home
Wim Sweldens and Peter Schr�oder
Chapter 1
First Generation Wavelets
1.1 Introduction
Wavelets have been making an appearance in many pure and applied areas of science and
engineering. Computer graphics with its many and varied computational problems has been no
exception to this rule. In these notes we will attempt to motivate and explain the basic ideas
behind wavelets and what makes them so successful in application areas.
The main motivation behind the development of wavelets and the many related ideas (see
Figure 1.1) was the search for fast algorithms to compute compact representations of functions
and data sets. How can such compact representations be achieved? There are many approaches,
some more computationally intensive, others less, but they all amount to exploiting structure
in the data or underlying functions. Depending on the application area this goes by di�erent
names such as the exploitation of \structure," \smoothness," \coherence," or \correlation." Of
course, for purely random signals or data no compact representations can be found. But most
of the time we are interested in realistic data and functions which do exhibit some smoothness
or coherence. In these cases wavelets and the fast wavelet transform turn out to be very useful
tools.
While the name \wavelets" is relatively young (early 80's,) the basic ideas have been around
for a long time in many areas from abstract analysis to signal processing and theoretical physics.
The main contribution of the wavelet �eld as such has been to bring together a number of
similar ideas from di�erent disciplines and create synergy between these techniques. The result
is a exible and powerful toolbox of algorithmic techniques combined with a solid underlying
17
MultigridBesov spaces
Subdivision
Adaptive gridding
Multiresolution analysis
Surfaces
Time frequency analysis
Wavelets
Transient analysis
Coherent states
Image compressionSubband filtering
Splines
Integral equations
Figure 1.1: Many areas of science, engineering, and mathematics have contributed to the
development of wavelets. Some of these are indicated surrounding the center bubble.
theory.
Because of the di�erent \parents" of wavelets, there are many ways to motivate their con-
struction and understand their properties. One example is subband �ltering from the area of
signal processing, where the aim is to decompose a given signal into frequency bands. In this
case �lter design and Fourier analysis are essential tools. Researchers in approximation theory
and abstract analysis were interested in the characterization of function spaces de�ned through
various notions of smoothness. Yet others were attempting to build approximate eigenfunctions
for certain integral operators to enhance their understanding of the underlying structures.
Instead of retracing these developments we will focus primarily on the idea of coherence, or
smoothness, and its exploitation to motivate and derive the wavelet transform together with a
large class of di�erent wavelets. The tool that we use to build wavelets transforms is called the
lifting scheme [27, 26]. The main feature of the lifting scheme is that all constructions are derived
in the spatial domain. This is in contrast to the traditional approach, which relies heavily on
the frequency domain. Staying in the spatial domain leads to two major advantages. First,
it does not require the machinery of Fourier analysis as a prerequisite. This leads to a more
intuitively appealing treatment better suited to those interested in applications, rather than
mathematical foundations. Secondly, lifting leads to algorithms that can easily be generalized
to complex geometric situations which typically occur in computer graphics. This will lead to
so-called \Second Generation Wavelets."
18
The lifting scheme was developed in 1994, but has numerous connections with earlier devel-
opments and can even be traced back all the way to the Euclidean algorithm! The development
of lifting was inspired by earlier work of Lounsbery et al. concerning wavelet transforms of
meshes [18] and work of Donoho concerning interpolating wavelet transforms [13]. Both these
developments are special cases of lifting. Lifting is also closely related to �lter bank constructions
of Vetterli and Herley [28] and local decompositions of Carnicer, Dahmen and Pe~na [2].
To make the treatment as accessible as possible we will take a very \nuts and bolts," algo-
rithmic approach. In particular we will initially ignore many of the mathematical details and
introduce the basic techniques with a sequence of examples. Other sections will be devoted to
more formal and rigorous mathematical descriptions of the underlying principles. These sections
are marked with an asterisk and can be skipped on �rst reading.
In this �rst chapter, we treat the classical or \First Generation Wavelets." We begin with a
simple example of a wavelet transform to introduce the basic ideas. Later we introduce lifting
in general and move on to the mathematical background. The second chapter is concerned with
generalizations to more complex geometries and \Second Generation Wavelets."
A word of caution is in order before we dive in the wavelet sea. Some readers might be familiar
with other overview or tutorial material concerning wavelets. In most cases these expositions use
the classical frequency domain framework. Since we are staying entirely in the spatial domain
our exposition may initially look rather foreign. This is due to the fact that our approach relies
entirely on the new lifting philosophy. However, we assure the reader that by the end of the
�rst chapter the connections between lifting and the classical treatment will be apparent. We
hope that as a result of this approach the reader will gain new insight into what makes wavelets
\tick."
1.2 A Simple Example: The Haar Wavelet
Consider two numbers a and b and think of them as two neighboring samples of a sequence. So a
and b have some correlation which we would like to take advantage of. We propose a well-known,
simple linear transform which replaces a and b by their average s and di�erence d:
s =a+ b
2d = b� a: (1.1)
19
The idea is that if a and b are highly correlated, the expected absolute value of their di�erence d
will be small and can be represented with fewer bits. In case that a = b the di�erence is simply
zero. We have not lost any information because given s and d we can always recover a and b as:
a = s� d=2
b = s+ d=2:
These reconstruction formulas can be found by inverting a 2� 2 matrix.
This simple observation is the key behind the so-called Haar wavelet transform. Consider a
signal sn of 2n sample values sn;l:
sn = fsn;l j 0 � l < 2ng:
Apply the average and di�erence transform onto each pair a = s2l and b = s2l+1. There are
2n�1 such pairs (l = 0 : : : 2n�1); denote the results by sn�1;l, and dn�1;l:
sn�1;l =sn;2l + sn;2l+1
2dn�1;l = sn;2l+1 � sn;2l:
The input signal sn, which has 2n samples, is split into two signals: sn�1 with 2n�1 averages
sn�1;l and dn�1 with 2n�1 di�erences dn�1;l. Given the averages sn�1 and di�erences dn�1 one
can recover the original signal sn.
We can think of the averages sn�1 as a coarser resolution representation of the signal sn and
of the di�erences dn�1 as the information needed to go from the coarser representation back to
the original signal. If the original signal has some local coherence, e.g., if the samples are values
of a smoothly varying function, then the coarse representation closely resembles the original
signal and the detail is very small and thus can be represented e�ciently.
We can apply the same transform to the coarser signal sn�1 itself. By taking averages and
di�erences, we can split it in a (yet) coarser signal sn�2 and another di�erence signal dn�2 where
each of them contain 2n�2 samples. We can do this n times before we run out of samples, see
Figure 1.2. This is the Haar transform. We end up with n detail signals dj with 0 � j � n� 1,
each with 2j coe�cients, and one signal s0 on the very coarsest scale. The coarsest level signal
s0 contains only one sample s0;0 which is the average of all the samples of the original signal, i.e.,
it is the DC component or zero frequency of the signal. By using the inverse transform we start
20
sn sn�1
dn�1
sn�2
dn�2
: : : s1
d1
s0
d0
-�����
-�����
-�����
-�����
Figure 1.2: Structure of the wavelet transform: recursively split into averages and di�erences.
s0
d0
s1
d1
s2
d2
: : : sn�1;l
dn�1;l
sn;l-
@@@@R
-
@@@@R
-
@@@@R
-
@@@@R
Figure 1.3: Structure of the inverse wavelet transform: recursively merge averages and di�er-
ences.
from s0 and dj for 0 � j < n and obtain sn again. Note that the total number of coe�cients
after transform is 1 for s0 plus 2j for each dj . This adds up to
1 +n�1Xj=0
2j = 2n;
which is exactly the number of samples of the original signal. The whole Haar transform can be
thought of as applying a N �N matrix (N = 2n) to the signal sn. The cost of computing the
transform is only proportional to N . This is remarkable as in general a linear transformation
of an N vector requires O(N2) operations. Compare this to the Fast Fourier Transform, whose
cost is O(N logN). It is the hierarchical structure of a wavelet transform which allows switching
to and from the wavelet representation in O(N) time.
21
1.3 Haar and Lifting
In this section we propose a new way of looking at the Haar transform. The novelty lies in the
way we compute the di�erence and average of two numbers a and b. Assume we want to compute
the whole transform in-place, i.e., without using auxiliary memory locations, by overwriting the
locations that hold a and b with the values of respectively s and d. This can not immediately
be done with the formulas of (1.1). Indeed, assume we want to store s in the same location
as a and d in the same location as b. Then the formulas (1.1) would lead to the wrong result.
Computing s and overwriting a leads to a wrong d (assuming we compute the average after the
di�erence.) We therefore suggest an implementation in two steps. First we only compute the
di�erence:
d = b� a;
and store it in the location for b. As we now lost the value of b we next use a and the newly
computed di�erence d to �nd the average as:
s = a+ d=2:
This gives the same result because a+ d=2 = a + (b � a)=2 = (a + b)=2. The advantage of the
splitting into two steps is that we can overwrite b with d and a with s, requiring no auxiliary
storage. A C-like implementation is given by
b -= a; a += b/2;
after which b contains the di�erence and a the average. The computations can be done in-place.
Moreover we can immediately �nd the inverse without formally solving a 2� 2 system: simply
run the above code backwards! (i.e., change the order and ip the signs.) Assume a contains
the average and b the di�erence. Then
a -= b/2; b += a;
recovers the values a and b in their original memory locations. This particular scheme of writing
a transform is a �rst, simple instance of the lifting scheme.
1.4 The Lifting Scheme
In this section we describe the lifting scheme in more detail. Consider a signal sj with 2j samples
which we want to transform into a coarser signal sj�1 and a detail signal dj�1. A typical case
22
sj - Split
-
P
?
?
�����
evenj�1
oddj�1
U
-����+ -
-
6
6dj�1
sj�1
Figure 1.4: The lifting scheme, forward transform: �rst compute the detail as the failure of a
prediction rule, then use that detail in an update rule to compute the coarse signal.
of a wavelet transform built through lifting consists of three steps: split, predict, and update.
Let us discuss each stage in more detail.
� Split: This stage does not do much except for splitting the signal into two disjoint sets of
samples. In our case one group consists of the even indexed samples s2l and the other group
consists of the odd indexed samples s2l+1. Each group contains half as many samples as
the original signal. The splitting into even and odds is a called the Lazy wavelet transform.
We thus built an operator so that
(evenj�1; oddj�1) := Split(sj)
Remember that in the previous example a was an even sample while b was an odd sample.
� Predict: The even and odd subsets are interspersed. If the signal has a local correlation
structure, the even and odd subsets will be highly correlated. In other words given one
of the two sets, it should be possible to predict the other one with reasonable accuracy.
We always use the even set to predict the odd one. In the Haar case the prediction is
particularly simple. An odd sample sj;2l+1 will use its left neighboring even sample sj;2l as
its predictor. We then let the detail dj�1;l be the di�erence between the odd sample and
its prediction:
dj�1;l = sj;2l+1 � sj;2l;
which de�nes an operator P such that
dj�1 = oddj�1 � P (evenj�1):
23
As we already argued, it should be possible to represent the detail more e�ciently. Note
that if the original signal is a constant, then all details are exactly zero.
� Update: One of the key properties of the coarser signals is that they have the same
average value as the original signal, i.e., the quantity
S = 2�j2j�1Xl=0
sj;l
is independent of j. This results in the fact that the last coe�cients s0;0 is the DC
component or overall average of the signal. The update stage ensures this by letting
sj�1;l = sj;2l + dj�1;l=2:
Substituting this de�nition we easily verify that
2j�1Xl=0
sj�1;l =2j�1Xl=0
(sj;2l + dj�1;l=2) = 1=22j�1Xl=0
(sj;2l + sj;2l+1) = 1=22jXl=0
sj;l;
which de�nes an operator U of the form
sj�1 = evenj�1 + U(dj�1)
All this can be computed in-place: the even locations can be overwritten with the averages and
the odd ones with the details. An abstract implementation is given by:
(oddj�1; evenj�1) := Split(sj);
oddj�1 -= P (evenj�1);
evenj�1 += U(oddj�1);
These three stages are depicted in a wiring diagram in Figure 1.4.
We can immediately build the inverse scheme, see the wiring diagram in Figure 1.5. Again
we have three stages:
� Undo update: Given dj and sj we can recover the even samples by simply subtracting
the update information:
evenj�1 = sj�1 � U(dj�1):
In the case of Haar, we compute this by letting
sj;2l = sj�1;l � dj�1;l=2:
24
sj-Merge
?
6
P
?
?
����+
evenj�1
oddj�1
U
�����
-
-
6
6dj�1
sj�1
Figure 1.5: The lifting scheme, inverse transform: �rst undo the update and recover the even
samples, then add the prediction to the details and recover the odd samples.
� Undo predict: Given evenj�1 and dj�1 we can recover the odd samples by adding the
prediction information
oddj�1 = dj�1 + P (evenj�1):
In the case of Haar, we compute this by letting
sn;2l+1 = dn�1;l + sn;2l:
� Merge: Now that we have the even and odd samples we simply have to zipper them
together to recover the original signal. This is the inverse Lazy wavelet:
sj = Merge(evenj�1; oddj�1):
Assuming that the even slots contain the averages and the odd ones contain the di�erence, the
implementation of the inverse transform is:
evenj�1 -= U(oddj�1);
oddj�1 += P (evenj�1);
sj := Merge(oddj�1; evenj�1)
The inverse transform is thus always found by reversing the order of the operations and ipping
the signs.
The lifting scheme has a number of algoritmic advantages
� In-place: all calculations can be performed in-place which can be an important memory
savings.
25
� E�ciency: in many cases the number of oating point operations needed to compute
both smooth and detail parts is reduced since subexpressions are reused.
� Parallelism: \unrolling" a wavelet transform into a wiring diagram exhibits its inherent
SIMD parallelism at all scales, with single write and multiple read semantics.
But perhaps more importantly lifting has some structural advantages which are both theoreti-
cally and pratically relevant:
� Inverse Transform: writing the wavelet transform as a sequence of elementary predict
and update (lifting) steps, it is immediately obvious what the inverse transform is: simply
run the code backwards. In the classical setting, the inverse transform can typically only
be found with the help of Fourier techniques.
� Generality: this is the most important advantage. Since the design of the transform is
performed without reference to Fourier techniques it is very easy to extend it to settings in
which, for example, samples are not placed evenly or constraints such as boundaries need
to be incorporated. It also carries over directly to curves, surfaces and volumes.
It is for these reasons that we built our exposition entirely around the lifting scheme.
1.5 The Linear Wavelet Transform
One way to build other wavelet transforms is through the use of di�erent predict and/or update
steps. What is the incentive behind improving predict and update? The Haar transform uses
a predictor which is correct in case the original signal is a constant. It eliminates zeroth order
correlation. We say that the order of the predictor is one. Similarly the order of the update
operator is one as it preserves the average or zeroth order moment. In many cases it is desirable
to have predictors which can exploit coherence beyond zeroth order correlation and similarly it
is often desirable to preserve higher order moments beyond the zeroth in the successively coarser
versions of the function.
In this section we build a predictor and update which are of order two. This means that the
predictor will be exact in case the original signal is a linear and the update will preserve the
average and the �rst moment. This turns out to be fairly easy. For an odd sample sj;2l+1 we let
the predictor be the average of the neighboring sample on the left (sj;2l) and the neighboring
26
Linear approximation over all samples
Difference: original minus predict
Linear approximation over even samples
Linear prediction at odd based on even locations
Figure 1.6: Example of linear prediction. On the top left the original signal with a piecewise
linear approximation. To its right is the coarser approximation based only on the even samples.
Using the even samples to predict values at the odd locations based on a linear predictor is shown
in the bottom left. The detail coe�cients are de�ned as the di�erence between the prediction and
the actual value at the odd locations (bottom right.) We may think of this as the failure of the
signal to be locally like a �rst degree polynomial.
sample on the right (sj;2l+2.) The detail coe�cient is given by
dj;l = sj;2l+1 � 1=2(sj;2l + sj;2l+2):
Figure 1.6 illustrates this idea. Notice that if the original signal was a �rst degree polynomial,
i.e., if sl = � l+� for some � and �, this prediction is always correct and all details are zero, i.e.,
N = 2. In other words, the detail coe�cients measure to which extent the original signal fails
to be linear. The expected value of their magnitudes is small. In terms of frequency content,
the detail coe�cients capture high frequencies present in the original signal.
In the update stage, we �rst assure that the average of the signal is preserved or
Xl
sj�1;l = 1=2Xl
sj;l:
We therefore update the even samples sj;2l using the previously computed detail signals dj�1;l.
27
Again we use the neighboring wavelet coe�cients and propose an update of the form:
sj�1;l = sj;2l +A (dj�1;l�1 + dj�1;l):
To �nd A we compute the average:
Xl
sj�1;l =Xl
sj;2l + 2AXl
dj�1;l = (1� 2A)Xl
sj;2l + 2AXl
sj;2l+1:
From this we get A = 1=4 as the correct choice to maintain the average. Because of the symmetry
of the update operator we also preserve the �rst order moment or
Xl
l sj�1;l = 1=2Xl
l sj;l:
One step in the wavelet transform is shown in the scheme in Figure 1.7. By iterating this scheme
we get a complete wavelet transform. The inverse is as easy to compute, letting
sj;2l = sj�1;l � 1=4 (dj�1;l�1 + dj�1;l);
to recover the even and
sj;2l+1 = dj;l + 1=2(sj;2l + sj;2l+2);
to recover the odd samples.
Remarks:
� In this section we have not mentioned anything about what should happen at the edges of
the signal. For now one can assume that signals are either periodic or in�nite. In a later
section, we show how lifting can be used to correctly deal with edge e�ects.
� The wavelet transform presented above is the biorthogonal (2,2) of Cohen-Daubechies-
Feauveau [5]. One might not immediately recognize this, but by substituting the predict
in the update, one can check that the coarser coe�cients are given by
sj�1;l = �1=8 sj;2l�2 + 1=4 sj;2l�1 + 3=4 sj;2l + 1=4 sj;2l+1 � 1=8 sj;2l+2:
Note that, when written in this form (which is not lifting,) the transform cannot be
computed in-place. Also to �nd the inverse transform one would have to rely on Fourier
techniques. This is much less intuitive and does not generalize to irregular settings.
28
sj�1;k dj�1;k sj�1;k+1
sj;2k dj�1;k sj;2k+2
sj;2k sj;2k+1 sj;2k+2
� � �
� � �
� � �
� � �
� � �
� � �
?
?
?
@@@@R
��
��
����
@@@@R
�1=2 �1=2 �1=2�1=2
����
@@@@R
1=4 1=41=4 1=4 1=41=4����
@@@@R
? ?
?
Figure 1.7: At the top the initial vector of coe�cients. As a �rst step all odd locations have
1=2 of their neighboring even locations subtracted. In the second step all even locations get a
contribution of 1=4 of their neighboring odd locations leaving the smooth and detail coe�cients.
Now one can recurse by repeating the same set of operations with a memory stride of 2, 4, 8,
and so forth. In the end the entire transform sits in the original memory locations.
1.6 Subdivision Methods
In the previous section we used the ideas of predict and update to build wavelet transforms. As
examples of prediction steps we saw constant prediction and linear prediction. In this section
we will focus on subdivision, which is a powerful paradigm to build predictors. Subdivision is
used extensively in CAGD to generate curves and surfaces. In that context it is used to re�ne a
given mesh (1D or 2D) through a simple local procedure. Choosing this procedure carefully will
result in an ever better approximation of some smooth limit curve or surface. Spline methods
with the de Casteljau and de Boor algorithms, as well as certain interpolating subdivisions, such
as the method of Deslauriers-Dubuc, fall into this category.
Considering subdivision methods as sources of predictors corresponds to focusing on the
design of various forms of P function box(es) in wavelet transform wiring diagrams such as
shown in Figures 1.4 and 1.5. Later on we will see how to construct suitable U function boxes,
but for now we will work in the setting without a U box, see Figure 1.8. Equivalently, we may
think of subdivision as an inverse wavelet transform with no detail coe�cients. In this context,
subdivision is often referred to as \the cascade algorithm." In later sections, we will see how for
a given subdivision scheme one can de�ne di�erent ways to compute the detail coe�cients.
29
Merge
odd
even
UI
P
Figure 1.8: The simplest subdivision and by implication inverse wavelet transform is interpo-
lating subdivision. Values at odd locations are computed as a function of some set of neighboring
even locations. The even locations do not change in the process.
We begin by describing interpolating subdivision which is often useful if one is interested in
constructions which interpolate a given data set. This corresponds to constructions with the
simplest form of the P function box as shown in Figure 1.8. Here subdivision corresponds to
predicting new values at the odd positions sj+1;2k+1 at the next �ner level, while all old values
sj+1;2k = sj;k remain the same.
odd
even
MergePAI
PHaar
UHaar
Figure 1.9: Average-interpolating subdivision can be used to enhance an existing inverse Haar
transform. Based on a number of even neighbors a Haar detail signal is computed and added
to the odd locations. Applying the usual inverse Haar transform yields average-interpolating
functions.
Next we will consider a subdivision which is based on average interpolation and is related
to the Haar transform and, through di�erentiation, to the interpolating transform. As the
name suggests this method interpolates local averages, rather than samples of a function. This
30
method will result in a P box which will be put in front of the inverse Haar transform as shown
in Figure 1.9.
U
even/2
Mergeeven
PPodd
odd
Figure 1.10: B-splines can also be built with lifting. For cubic B-splines this diagram results.
It is an instance of multiple P stages and also introduces a new element, scaling.
Finally we discuss cubic B-splines as another example of building predictors. In this case
we will see how powerful predictors can be built by allowing prediction to consist of several
elementary stages including rescales (see the division by 2 on the top wire in Figure 1.10.)
1.6.1 Interpolating Subdivision
Interpolating subdivision starts by considering the problem of building an interpolant for a given
data sequence. For example, we might be given a sequence of samples of some unknown function
at regular intervals and the task to �ll in intermediate values in a smooth fashion. Deslauriers
and Dubuc attacked this problem by de�ning a recursive procedure for �nding the value of
an interpolating function at all dyadic points [10, 11]. This algorithm proceeds by inserting a
new predicted coe�cient inbetween each pair of existing coe�cients. Since none of the already
existing coe�cients will get changed interpolation of the original data is assured.
Perhaps the simplest way to set up such an interpolating subdivision scheme is the following.
Let fs0;kg with k 2 Z be the original sample values. Now de�ne a re�ned sequence of sample
values recursively as
sj+1;2k = sj;k
sj+1;2k+1 = 1=2 (sj;k + sj;k+1);
and place the sj;k at locations xj;k = k 2�j . Or in words, new values are inserted halfway
between old values by linearly interpolating the two neighboring old values (see Figure 1.11.)
31
1/21/2
1/21/2
1/2
1/2
1/2
1/2
1/2
1/2
1/21/21/2
1/2
1/21/2
1/2
1/2
Figure 1.11: The linear subdivision, or prediction, step inserts new values inbetween the old
values by averaging the two old neighbors. Repeating this process leads in the limit to a piecewise
linear interpolation of the original data.
32
This subdivision rule was already used in the linear prediction transformation in Section 1.5.
In the limit, values at all dyadic points will be de�ned and the function can be extended to
a continuous function over all real numbers. The result is a piecewise linear interpolation of
the original sample values. Suppose the initial sample values given to us were actually samples
of a linear polynomial. In that case our subdivision scheme will exactly reproduce that linear
polynomial. We then say that the order of the subdivision scheme is 2. The order of polynomial
reproduction is important in quantifying the quality of a subdivision (or prediction) scheme.
To see how to build more powerful versions of such subdivisions we look at the procedure
above in a slightly di�erent light. Instead of thinking of it as averaging we can describe it via
the construction of an interpolating polynomial. Given data sj;k and sj;k+1 at xj;k = k 2j and
xj;k+1 = (k + 1) 2j determine the unique linear polynomial which interpolates this data. Now
de�ne the new coe�cient sj+1;2k+1 as the value that this polynomial takes on at xj+1;2k+1 =
(2k + 1) 2j+1, which is halfway between k 2j and (k + 1) 2j . Given this way of looking at the
problem an approach to build higher order predictors immediately suggests itself: instead of
using only immediate neighbors to build a linear interpolating polynomial, use more neighbors
on either side to build higher order interpolating polynomials and de�ne the new value by
evaluating the resulting polynomial at the new midpoint.
For example, we can use two neighboring values on either side and de�ne the (unique) cubic
polynomial p(x) which interpolates those four values
sj;k�1 = p(xj;k�1)
sj;k = p(xj;k)
sj;k+1 = p(xj;k+1)
sj;k+2 = p(xj;k+2):
The new coe�cient at xj+1;2k+1 is de�ned to be the value that this cubic polynomial takes on
at the midpoint. With the old samples untouched we get
sj+1;2k = sj;k
sj+1;2k+1 = p(xj+1;2k+1):
Note that the polynomial is in general a di�erent one for each successive set of 4 old values.
Figure 1.12 illustrates these ideas with the linear prediction on the left side and the cubic
prediction on the right.
33
linear interpolation
linear interpolation
cubic interpolation
cubic interpolation
Figure 1.12: On the left, the linear subdivision (or prediction) step inserts new values inbetween
the old values by averaging the two old neighbors. On the right cubic polynomials are used for
every quad of old values to determine a new inbetween value.
Observing that this unique cubic interpolating polynomial is itself a linear function of the
values fsj;k�1; sj;k; sj;k+1; sj;k+2g we �nd, after some algebraic manipulation|and under the
assumption that the associated xj;k are equally spaced|the new value sj+1;2k+1 to be a simple
weighting of fsj;k�1; sj;k; sj;k+1; sj;k+2g with weights f�1=16; 9=16; 9=16;�1=16g. This stencil is
well known in the CAGD literature as the 4-point scheme [16].
In general we use N (N = 2D even) samples and build a polynomial of degree N � 1. For
each group of N = 2D coe�cients fsj;k�D+1; : : : ; sj;k; : : : ; sj;k+Dg, it involves two steps:
1. Construct a polynomial p of degree N � 1 so that
sj;k+l = p(xj;k+l) for �D + 1 � l � D:
2. Calculate one coe�cient on the next �ner level as the value of this polynomial at xj+1;2k+1
sj+1;2k+1 = p(xj+1;2k+1):
We say that the order of the subdivision scheme is N .
34
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
1.0
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
1.0
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
1.0
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
1.0
Figure 1.13: Scaling functions resulting from interpolating subdivision. Going from left to
right, top to bottom the order N of the subdivision is 2, 4, 6, and 8. Each function takes on the
value one at the origin and zero at all other integers. Note that the functions have support, i.e.,
they are non-zero, over somewhat larger intervals than shown here.
What makes interpolating subdivision so attractive from an implementation point of view is
that we only need a routine which can evaluate an interpolating polynomial at a single location
given some number of sample values and locations. The new sample value is de�ned through
evaluation of this polynomial at the new, re�ned location. A particularly e�cient (and stable)
procedure for this is Neville's algorithm [24, 21]. If all samples are evenly spaced this polynomial
and the associated weights need to be computed only once and can be used from then on. Notice
also that nothing in the de�nition of this procedure requires the original samples to be located
at integers. This feature can be used to de�ne scaling functions over irregular subdivisions.
Interval boundaries for �nite sequences are also easily accommodated. We will come back to
these observations in Chapter 2.
35
1.6.2 Interpolating Scaling Functions
Here we formally de�ne the notion of scaling functions. Each coe�cient sj;k has one scaling
function denoted as 'j;k(x) associated with it. This scaling function is de�ned as follows: set all
sj;l on level j equal to zero except for sj;k which is set to 1. Now run the interpolating subdivision
scheme starting from level j ad in�nitum. The resulting limit function is 'j;k(x). Consider some
original sequence of sample values sj;k at level j. Simply using linear superposition and starting
the subdivision scheme at level j, yields a limit function f(x) of the form
f(x) =Xk
sj;k 'j;k(x):
If the sample locations are regularly spaced (xj;k = k 2�j ,) it is easy to see that all scaling
functions are translates and dilates of one �xed function '(x) = '0;0(x):
'j;k(x) = '(2jx� k):
This function is also called the fundamental solution of the subdivision scheme. Figure 1.13
shows the scaling functions '0;0 which result from the interpolating subdivision of order 2, 4, 6,
and 8 (left to right, top to bottom.) In the case of linear interpolation it is easy to see that the
fundamental solution is the well known piecewise linear hat function. Perhaps surprisingly the
fundamental solution of the cubic scheme described above is not a (piecewise) cubic polynomial.
However, it is still true that the subdivision process will reproduce all cubic polynomials: if the
initial sequence of samples came from a cubic polynomial P (x) then all re�nement steps will use
exactly that polynomial p(x) = P (x) (due to uniqueness) to de�ne new intermediate values. As
a consequence all new points will be samples of the original polynomial, in the limit reproducing
the original cubic. Equivalently we say that any cubic polynomial can be written as a linear
combination of scaling functions.
The properties of '(x) in the general case are:
1. Compact support: '(x) is exactly zero outside the interval [�N +1; N � 1]. This easily
follows from the locality of the subdivision scheme.
2. Interpolation: '(x) is interpolating in the sense that '(0) = 1 and '(k) = 0 for k 6= 0.
This immediately follows from the construction.
36
3. Polynomial reproduction: Polynomials up to degree N � 1 can be expressed as linear
combinations of scaling functions. More precisely:
Xk
(k2�j)p 'j;k(x) = xp for 0 � p < N:
This can be seen by starting the scheme on level j with the sequence xpj;k
and using the
fact that the subdivision de�nition insures the reproduction of polynomials up to degree
N � 1.
4. Smoothness: Typically 'j;k 2 C� where � = �(N). We know that �(4) < 2 and
�(6) < 2:830 (strict bounds.) Also, for large N the smoothness increases linearly as
:2075N . This fact is much less trivial than the previous ones. We refer to [10, 11] and [9,
p. 226].
5. Re�nability: This means the scaling function satis�es a re�nement relation of the form
'(x) =NX
l=�N
hl '(2x � l):
This can be seen as follows. Do one step in the subdivision starting from s0;k = �k;0. Call
the result hl = s1;l. It is easy to see that only 2N + 1 coe�cients hl are non-zero. Now
start the subdivision scheme from level 1 with these values s1;l. The re�nement relation
follows from the fact that this should give the same result as starting from level 0 with
the values s0;k. Also because of interpolation, it follows that h2l = �0;l. We refer to the hl
as �lter coe�cients. With a change of variables in the re�nement relation we get
'j;k(x) =Xl
hl�2k 'j+1;l(x): (1.2)
In the case of linear subdivision, the �lter coe�cients are hl = f1=2; 1; 1=2g. The associated
scaling function is the familiar linear B-spline \hat" function. The cubic case leads to the �lter
hl = f�1=16; 0; 9=16; 1; 9=16; 0;�1=16g. For those familiar with the more traditional treatment
of wavelets, the hl coe�cients describe the impulse response sequence of the low-pass �lter
used in the inverse wavelet transform. Once we have the �lter coe�cients, we can write the
subdivision as
sj+1;l =Xk
hl�2k sj;k:
37
Figure 1.14: Examples of average-interpolation. On the left a diagram showing the constant
average-interpolation scheme. Each subinterval gets the average of a constant function de�ned by
the parent interval. This is what happens in the inverse Haar transform if the detail coe�cients
are zero. On the right the same idea is applied to higher order average-interpolation using
a neighboring interval on either side. The unique quadratic polynomial which has the correct
averages over one such triple is used to compute the averages over the subintervals of the middle
interval. This process is repeated an in�nitum to de�ne the limit function.
We see that because h2l = �0;l, the subdivision scheme is interpolating, i.e., even indexed samples
remain unchanged. Note that the hl are used in a re�nement relation to go from �ner level scaling
functions to coarser level scaling functions, while during subdivision the same hk are used to go
from coarser level samples to �ner level samples.
1.6.3 Average-Interpolating Subdivision
In contrast to the interpolating subdivision scheme of Deslauriers-Dubuc we now consider an-
other subdivision scheme: average-interpolation as introduced by Donoho [12]. To begin with
we focus on the basic idea, i.e., how to produce the new values at the �ner level directly from
the old values at the coarser level. At the end of this section, we will �t average-interpolation
into lifting as shown in the wiring diagram of Figure 1.9.
38
The starting point of interpolating subdivision was a set of samples of some function. Average-
interpolation can be motivated similarly. Suppose that instead of samples we are given averages
of some unknown function over intervals
s0;k =
Z k+1
k
f(x) dx:
For purposes of exposition we will assume that the intervals are all unit sized, a restriction which
will be removed in the second generation setting (see Chapter 2.)
Such values might arise from a physical device which does not perform point sampling, but
integration, as is done for example, by a CCD cell (to a �rst approximation.) How can we use
such values to de�ne a function whose averages are the measurement values given to us? One
obvious answer is to use these values to de�ne a piecewise constant function which takes on the
value s0;k for x 2 [k; k + 1]. This corresponds to the following constant average-interpolation
scheme
sj+1;2k = sj;k
sj+1;2k+1 = sj;k:
This is the inverse Haar transform with all detail coe�cients equal to zero. Cascading this
procedure ad in�nitum we get a function which is de�ned everywhere and is piecewise constant.
Furthermore its averages over intervals [k; k+1] match the observed averages. The disadvantage
of this simple scheme is that the limit function is not smooth. In order to understand how
to increase the smoothness of such a reconstruction we de�ne a general average-interpolating
procedure.
One way to think about the previous scheme is to describe it as follows. We assume that the
(unknown) function we are dealing with is a constant polynomial over the interval [k; k+1]. The
values of sj+1;2k and sj+1;2k+1 then follow as the averages of this polynomial over the respective
subintervals. The diagram on the left side of Figure 1.14 illustrates this scheme.
Just as before we can extend this idea to higher order polynomials. The next natural choice
is quadratic. For a given interval consider the intervals to its left and right. De�ne the (unique)
quadratic polynomial p(x) such that
sj;k�1 =
Z k 2�j
(k�1)2�jp(x) dx
sj;k =
Z (k+1)2�j
k 2�jp(x) dx
39
sj;k+1 =
Z (k+2)2�j
(k+1)2�jp(x) dx:
Now compute sj+1;2k and sj+1;2k+1 as the average of this polynomial over the left and right
subintervals of [k 2�j; (k + 1)2�j ]
sj+1;2k = 2
Z (k+1=2)2�j
k 2�jp(x) dx
sj+1;2k+1 = 2
Z (k+1)2�j
(k+1=2)2�jp(x) dx:
Figure 1.14 (right side) shows this procedure.
It is not immediately clear what the limit function of this process will look like, but it easy
to see that the procedure will reproduce quadratic polynomials. The argument is the same as in
the interpolating case: assume that the initial averages fs0;kg were averages of a given quadratic
polynomial q(x). In that case the unique polynomial p(x) which has the prescribed averages over
each triple of intervals will always be that same polynomial q(x) which gave rise to the initial
set of averages. Since the interval sizes go to zero and the averages over the intervals approach
the value of the underlying function in the limit the original quadratic polynomial q(x) will be
reproduced.
We can de�ne the scaling function exactly the same way as in the interpolating subdivision
case. In general we use N intervals (N odd) to construct a polynomial of degree N � 1. The
order of the subdivision scheme is given by N . Figure 1.15 shows the scaling functions of order
1, 3, 5, and 7 (left to right, top to bottom.)
This scheme also has the virtue that it is very easy to implement. The conditions on the
integrals of the polynomial result in an easily solvable linear system relating the coe�cients of
p(x) to the sj;k. In its simplest form (we will see more general versions later on) we can streamline
this computation even further by taking advantage of the fact that the integral of a polynomial
p(x) is itself another polynomial P (x) =R x0 p(y) dy. This leads to another interpolation problem
0 = P ((k � 1)2�j)
sj;k = P (k 2�j)
sj;k + sj;k+1 = P ((k + 1)2�j)
sj;k + sj;k+1 + sj;k+2 = P ((k + 2)2�j):
Given such a polynomial P (x) the �ner averages become
sj+1;2k = 2(P ((k + 1=2)2�j)� P (k 2�j))
40
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
1.0
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
1.0
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
1.0
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
1.0
Figure 1.15: Scaling functions which result from average-interpolation. Going from left to
right, top to bottom orders of the respective subdivision schemes were 1, 3, 5, and 7.
sj+1;2k+1 = 2(P ((k + 1)2�j)� P ((k + 1=2)2�j)):
This computation, just like the earlier interpolating subdivision, can be implemented in a stable
and e�cient way with Neville's algorithm.
More generally we de�ne the average-interpolating subdivision scheme of order N as follows.
For each group of N = 2D + 1 coe�cients fsj;k�D; : : : ; sj;k; : : : ; sj;k+Dg, it involves two steps:
1. Construct a polynomial p(x) of degree N � 1 so that
sj;k+l =
Z (k+l+1)2�j
(k+l)2�jp(x) dx for �D � l � D:
2. Calculate two coe�cients on the next �ner level as
sj+1;2k = 2
Z (k+1=2)2j
k 2jp(x) dx
41
sj+1;2k+1 = 2
Z (k+1)2�j
(k+1=2)2�jp(x) dx:
These de�nitions will also work if the intervals are not on a dyadic grid but have unequal
size. In that case one has to carefully account for the given interval width when computing
the averages. However, nothing fundamental changes, even in the presence of boundaries or
weighted measures. We will turn to these generalizations in Chapter 2.
There is one issue we have not yet addressed (see the remark in the �rst paragraph of this
section.) The equations we have given so far do not �t into the lifting scheme framework,
since one cannot simply overwrite sj;k with sj+1;2k as is desirable. Instead we use an already
existing inverse Haar transform and use the average-interpolating subdivision as a P box before
entering the inverse Haar transform. The diagram in Figure 1.9 illustrates this setup. Instead of
computing sj+1;2k and sj+1;2k+1 directly we will compute their di�erence dj;k = sj+1;2k+1�sj+1;2k
and feed this as a di�erence signal into the inverse Haar transform. Given that the average of
sj+1;2k and sj+1;2k+1 is sj;k, it follows that the inverse Haar transform when given sj;k and
dj;k will compute sj+1;2k and sj+1;2k+1 as desired. This leads to a transform with three lifting
steps|the average-interpolating prediction, the Haar update, the Haar prediction|and all the
advantages of lifting, such as in-place and easy invertibility remain.
1.6.4 Average-Interpolating Scaling Functions
Scaling functions are de�ned exactly the same way as in the interpolating case. If the samples are
regularly spaced, they are translates and dilates of one �xed function '(x). The limit function
f(x) of the subdivision scheme is given by
f(x) =Xk
sj;k 'j;k(x):
The properties of the associated scaling function are as follows:
1. Compact support: '(x) is exactly zero outside the interval [�N + 1; N ]. This easily
follows from the locality of the subdivision scheme.
2. Average-interpolation: '(x) is average-interpolating in the sense that
Z k+1
k
'(x) dx = �k;0:
This immediately follows from the de�nition.
42
3. Polynomial reproduction: '(x) reproduces polynomials up to degree N � 1. In other
words Xk
1=(p+ 1) ((k + 1)p+1 � kp+1)'(x� k) = xp for 0 � p < N:
This can be seen by starting the scheme with this particular coe�cient sequence and using
the fact that the subdivision reproduces polynomials up to degree N � 1.
4. Smoothness: '(x) is continuous of order R, with R = R(N) > 0. One can show that
R(3) � :678, R(5) � 1:272, R(7) � 1:826, R(9) � 2:354, and asymptotically R(N) �
:2075N [12].
5. Re�nability: '(x) satis�es a re�nement relation of the form
'(x) =NX
l=�N+1
hl '(2x� l):
This follows from similar reasoning as in the interpolating case starting from s0;k = �0;k.
As before subdivision can be written as:
sj+1;l =Xk
hl�2k sj;k:
In the case of quadratic subdivision, the �lter coe�cients are hl = f�1=8; 1=8; 1; 1; 1=8;�1=8g.
For quartic subdivision hl = f3=128;�3=128;�11=64; 11=64; 1; 1; 11=64;�11=64;�3=128; 3=128g
results. These constructions are part of the biorthogonal family of scaling functions described
by Cohen-Daubechies-Feauveau [5], where they are named (1,3) and (1,5) respectively.
An interesting connection between Deslauriers-Dubuc interpolation and average-interpolation
was pointed out by Donoho [12, Lemma 2.2]: Given some sequence fs0;kg apply interpolating
subdivision of order N = 2D to it. Comparing this to the sequence that results from applying
average-interpolation of order N 0 = 2D � 1 to the sequence fs00;k = s0;k+1 � s0;kg we �nd
sj;k = 2js0j;k (once again assuming equal sized intervals.) This observation follows directly from
the fact that the average of the derivative of an interpolating polynomial p is simply the di�erence
of the values that the polynomial takes on at the end and start of an interval, divided by the
size of the interval. As a consequence we have
d
dx'I(x) = 'AI(x+ 1)� 'AI(x); (1.3)
43
where the superscripts stand for I-nterpolating scaling function and A-verage I-nterpolating
scaling function.
This provides a simple recipe for computing the derivative of a function f(x) which results
from interpolation of some sequence fs0;kg: simply take successive di�erences �rst and then
apply the average-interpolating subdivision of one order less. Obviously, one could also take
the di�erences and then apply an interpolating subdivision method, not necessarily average-
interpolation. In that case the limit function would be an approximation of the derivative, while
Equation 1.3 holds exactly.
1.6.5 B-Spline Subdivision
B-splines and in particular cubic B-splines are a very common primitive in computer graphics.
Their popularity stems from the fact that they are C2, which is important in many rendering
applications to ensure smooth shading. They also possess the convex hull property and are
variation diminishing which makes them a favorite for curve and surface modeling tasks.
In this section we will show how cubic B-splines can be used as a predictor in wavelet con-
structions. In order to keep the exposition simple we will only consider cardinal cubic B-splines
here, i.e., all control points are associated with integer parameter positions. We will also assume
a bi-in�nite sequence for now and only discuss the boundary case later.
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
1.0
Figure 1.16: Cubic B-spline scaling function.
Given a set of cubic B-spline control points at the integers fs0;kg subdivision tells us how
to �nd a set of control points at the half integers which describe the same underlying B-spline
curve. Repeating this process as before the control points become dense and converge to the
44
actual curve (see Figure 1.16.) Typically this subdivision rule is described through a convolution
with the sequence fhkg = 1=8 f1; 4; 6; 4; 1g, i.e.,
sj+1;2k = 1=8 (sj;k�1 + 6sj;k + sj;k+1)
sj+1;2k+1 = 1=8 (4sj;k + 4sj;k+1):
As stated this implementation of cubic B-spline subdivision does not �t into our wiring diagram
framework, since it cannot be executed in place: the memory location for sj;k and sj+1;2k are
identical. Computing sj+1;2k will leave us with the wrong values needed for the computation of
sj+1;2k+2, for example!
This is the �rst instance of a subdivision method which requires a sequence of elementary P
function boxes, as well as a new primitive: scaling. Scaling �ts into the lifting philosophy as it
can be done in-place and is trivially inverted. There are a number of advantages to implementing
the cubic B-spline subdivision in that way. Aside from allowing us to do the computation in
place, it will be maximally parallel and, as we will see later, make it trivial to derive elementary
wavelets (and through further lifting an entire class of wavelets.)
We begin by averaging even coe�cients into odd locations
sj+1;2k+1 = 1=2 (sj;k + sj;k+1):
This is the Podd box in the diagram of Figure 1.10. Next apply a Peven function box to get
sj+1;2k = sj;k + 1=2 (sj+1;2k�1 + sj+1;2k+1):
Finally dividing all even locations be 2
sj+1;2k == 2
we get our desired sequence. Substitution of the de�nition of the odd locations immediately
veri�es that the result is as intended
sj+1;2k = 1=2 (1=4sj;k�1 + 6=4sj;k + 1=4sj;k+1):
We have succeeded in building cubic B-spline scaling functions with a simple inplace computation
which only involves immediate neighbors. As we will see later, choosing an appropriate U
function box to put before this inverse transform (see Figure 1.10) can give us associated spline
wavelets.
45
1.6.6 The Next Step
We have just seen a number of ways to generate scaling functions which �t well with the overall
philosophy of lifting and give the practitioner a rich set of constructions from which to choose.
Before we discuss the construction of the associated wavelets, and how they \drop out of" the
wiring diagrams we �rst consider the notion of a Multiresolution Analysis more formally in the
following section.
1.7 Multiresolution Analysis�
In this section, we will go into some more mathematical detail about multiresolution analysis as
originally conceived by Mallat and Meyer [20, 19]. In the previous section we de�ned the notion
of a scaling function '(x) and saw how all scaling functions are simply translates and dilates of
one �xed function:
'j;l(x) = '(2jx� l):
In this section we will use these functions to build a multiresolution analysis.
Assume we start a subdivision scheme of the previous section on level j from a sequence
fsj;lg. Because of linearity, we can write the resulting limit function sj(x) as
sj(x) =Xl
sj;l 'j;l(x):
The scaling functions have compact support, so that for a �xed x the summation only involves a
�nite number of terms. We next de�ne the space Vj of all limit functions obtained from starting
the subdivision on level j. This is the linear space spanned by the scaling functions 'j;l with
l 2 Z:
Vj = spanf'j;l(x) j l 2 Zg:
For example, if '(x) is the Haar scaling function, the Vj is the space of functions which are
piecewise constant on the intervals [k2�j ; (k+1)2�j) with k 2 Z. If '(x) is the piecewise linear
hat (or tent) function, Vj is the space of continuous functions which are piecewise linear on the
intervals [k2�j ; (k + 1)2�j) with k 2 Z.
The di�erent Vj spaces satisfy the following properties which make them a multiresolution
analysis:
1. Nestedness: Vj � Vj+1.
46
2. Translation: if f(x) 2 Vj then f(x+ k2�j) 2 Vj .
3. Dilation: if f(x) 2 Vj then f(2x) 2 Vj+1.
4. Completeness: every function f(x) of �nite energy (2 L2) can be approximated with
arbitrary precision with a function from Vj for suitably high j.
The nestedness property follows immediately from the re�nement relation. If we can write '(x)
as a linear combination of the '(2x� k), then we can also write 'j;l as a linear combination of
the 'j+1;k and thus Vj � Vj+1. The translation and dilation properties follow from the de�nition
of Vj . The proof of the completeness property is beyond the scope of this tutorial. We refer
to [9] for more details.
A very important property of a multiresolution analysis is the order. We say that the order
of a multiresolution analysis is N in case every polynomial p(x) of degree strictly less than N
can be written as a linear combination of scaling functions of a given level. In other words
polynomials of degree less than N belong to all spaces Vj. For example, in case of the Haar
multiresolution analysis, N = 1. Constant functions belong to all Vj spaces. In case of the hat
function, N = 2 because all linear functions belong to all Vj spaces. Note that the order of
a multiresolution analysis is the same as the order of the predictor used to build the scaling
functions.
This concludes the discussion of subdivision, scaling functions, and multiresolution. Remem-
ber that the original motivation for treating subdivision was that it provides predictors which
readily �t into the lifting framework. In the next section we turn to the other main component
of lifting: update.
1.8 Update Methods
We have seen several examples of wiring diagram constructions. In the very beginning of this
chapter we started with the Haar forward and inverse transform. In that case it was clear from
inspection what the update box had to do, given that we wanted the coarser signal to be an
average of the �ner signal. For the linear transform, �nding the update box required some
algebraic manipulations. Generally update boxes are designed to ensure that the coarser signal
has the same average as the higher resolution signal.
In the section on subdivision methods we discussed various ways of realizing predictors in
the context of an inverse transform with, e�ectively, zero wavelet coe�cients. The diagrams in
47
Figures 1.8, 1.9, and 1.10 show where the update boxes go, but we have not yet explained how
they should be designed.
That is the subject matter of this section. We begin by making some simple observations
about forward and inverse transforms and give a number of design criteria for update boxes. As in
the section on subdivision we will only consider the regular setting and postpone generalizations
to Chapter 2.
1.8.1 Forward and Inverse Transforms
One of the nice features of the lifting formalism is the ease with which one �nds the inverse
transform: simply ip the diagram left to right, switch additions and subtractions, and switch
multiplications and divisions. In this way forward and inverse transform are equivalent from
a design point of view. For some questions, it is easier to consider one rather than the other.
For example, in the case of borrowing predictors from subdivision methods it is most natural to
�rst consider the inverse transform, and later derive the forward transform. These relationships
between forward and inverse \wiring diagrams" lead directly to important consequences.
Let us consider the case of subdivision more closely. It is a linear transformation which
takes in m degrees of freedom on the top wire, say coe�cients associated with the integers,
and outputs (2m � 1) degrees of freedom, i.e., coe�cients associated with the half integers
due to interpolating subdivision. We are a bit sloppy here when ignoring issues such as the
boundary, but for purposes of our argument we may do so for now (later on when we discuss
the generalization to boundaries we will see that the present argument is solid.) Figure 1.17
indicates this idea on the left.
A question that immediately arises is whether we can characterize the remainingm�1 degrees
of freedom in a simple and useful form. This is an important aspect of multiresolution analysis
where we have the spaces Vj � Vj+1 and we may ask how to characterize the di�erence between
two such consecutive spaces: what is the di�erence in \resolution" between the two spaces?
What are we loosing when we go from a �ner resolution to a coarser resolution?
Since subdivision is a linear transformation we may think of it as a ((2m�1)�m) matrix. In
that language the question of characterizing the extra degrees of freedom, is to ask for a set of
columns to add to this matrix so that it becomes square, is banded, and its inverse is banded as
well. At �rst sight these requirements appear to be rather hard to satisfy. However, we already
know the answer to this question!
Suppose we have the inverse transform diagram with the detail wire added (on the right
48
Interpolating subdivision
m DOFs
Interpolating subdivision completed
2m-1 DOFs
m-1 DOFs
m DOFs
2m-1 DOFsI
P MergeI
Merge
even
odd odd
even
P
Figure 1.17: On the left pure interpolating subdivision is indicated: odd samples are computed
using a predictor based on even samples. Beginning with m degrees of freedom, (2m� 1) result
after one step (in the bounded interval case.) Simply extending the lower wire to the left is a
proper completion, i.e., it describes the extra degrees of freedom added by one subdivision step.
The proof consists of observing that the completed diagram is invertible (as opposed to the pure
subdivision diagram which is not.) Given that it is invertible the bottom wire must represent the
remaining DOFs. In the language of matrices|all operations are linear|this simply states that
subdivision can be completed in such a way that the resulting matrix is banded and its inverse is
banded as well.
side of Figure 1.17,) then we can immediately build a diagram with the forward transform
(\running everything backwards.") Concatenating the two we get the identity by construction.
Since the whole operation is linear we have just convinced ourselves that the linear operator
represented by the inverse transform is invertible, if we also consider the lower detail wire. In
other words, writing subdivision in our somewhat nonstandard way, we can immediately read
o� a representation of the additional degrees of freedom introduced as one goes from Vj to Vj+1:
simply put the delta sequence �0;l on the detail wire. In the case of interpolating subdivision
(Figure 1.18 shows the example of linear interpolating subdivision) the result is a delta sequence
�1;2l+1. If we continue the subdivision process ad in�nitum the functions shown result. Note
how the functions due to (successive) smooth coe�cients overlap. The functions due to detail
coe�cients, the actual wavelets, are the same shape in this case, but smaller and sit inbetween.
A more complicated (and interesting) example is provided by the subdivision diagram for the
cubic B-splines. Figure 1.19 illustrates the behavior of this subdivision diagram as a function of
putting successive delta sequences on the smooth and detail wire respectively. For the smooth
wire �0;k results in the well known sequence 1=8 f1; 4; 6; 4; 1g of re�nement coe�cients. Putting
49
������������
������������
�� ��
Limit functions
...0,0,1,0,0...
...0,0,0,1,0,0,0...
Limit functions
��
...0,0,.5,1,.5,0,0...
��������������
������������
������������
������������
...0,0,1,0,0...
P
MergePI
MergeI
Figure 1.18: The result of putting a delta sequence on the top (smooth) or bottom (detail)
wire of an interpolating subdivision (inverse transform) in the case of the linear predictor. On
the right the resulting functions, after subdivision ad in�nitum, for a sequence of consecutive
delta inputs. Note how the detail functions sit \inbetween" the scaling functions. The picture is
essentially the same for higher order interpolating predictors: the detail functions are inbetween
the scaling functions and dilated (by a factor of 2) versions of the scaling functions.
�0;l, on the detail wire results in the sequence 1=4 f1; 4; 1g. From our considerations above we
know that this sequence must be a completion of the space spanned by the 1=8 f1; 4; 6; 4; 1g and
in this way captures the degrees of freedom inbetween Vj and Vj+1.
In principle we could stop here and would not have to worry about update boxes. The
problem is that there are many possible completions and the ones we get \for free" from the
subdivision diagram are not necessarily the only ones or the best ones. Note that the detail
functions which we got in the linear (Figure 1.18) and cubic B-spline case (Figure 1.19) do not
have a zero integral, a condition we need to ensure that the coarser version of the signal has the
same integral as the �ner version (see the next section.)
Designing update boxes is all about manipulating the representation of these di�erences, of
\inbetween" degrees of freedom. That this should be so is easy to see. The update box on the
50
/2
����������8
...0,1,4,6,4,1,0...
Podd
MergePeven
Limit functions
����������...0,0,1,4,1,0,0...4
...0,0,1,0,0...
...0,0,1,0,0...
Limit functions
����
����
����
������
evenPP
oddMerge
/2
Figure 1.19: The result of putting a delta sequence on the top (smooth) or bottom (detail) wire
of a cubic B-spline subdivision (inverse transform.) On the right the resulting functions, after
subdivision ad in�nitum, for a sequence of consecutive delta inputs.
inverse transform side makes a contribution to the smooth wire based on a signal on the odd
wire (see Figures 1.8, 1.9, and 1.10.) Considering the sequence �0;l on the detail wire we can see
how having a U box will alter the sequence generated at the end (see Figure 1.20.)
From the examples of the Haar and Linear transform we remember that the main purpose of
the update box is to ensure that the average of the coarser level signals is maintained, i.e,
2�j2j�1Xl=0
sj;l (1.4)
is independent of j. From this we see that there is no need for an extra update box in the case
of wavelet transforms built from average-interpolation. Given that we can write the forward
transform as a Haar transform followed by one average-interpolation prediction operation, the
update box which is part of the Haar transform already assures (1.4). We next discuss how to
design update boxes for interpolating subdivision and cubic B-spline subdivision.
51
1.8.2 Update for Interpolation
We already encountered one update step for the linear example in the very beginning. It
consisted of computing the coarser level coe�cients as
sj�1;l = sj;2l + 1=4 (dj�1;l�1 + dj�1;l):
We show here how such an update can be derived in general. The main purpose of the update
step is to ensure that the average is maintained. Saying that the average of the signal sj is
equal to the average of the signal sj�1 is equivelent to saying that the average of the detail or
di�erence signal dj�1 is zero. Given that the detail signal is nothing else but a linear combination
of complementary sequences as derived in the previous section, it is su�cient to construct
complementary sequences with zero average. We consider Figure 1.18 again. The complementary
sequence, i.e., the result from putting a 1 on the odd wire is f0; 0; 1; 0; 0g. Obviously this does
not have a zero average. Therefore we use the update box from the odd wire to the even wire (see
Figure 1.20.) The result of putting a 1 on the even wire is f1=2; 1; 1=2; 0; 0g or f0; 0; 1=2; 1; 1=2g
Merge
...0,0,1,0,0...
...0,0,-.25,-.25,0...
...0,-.125,.75,-.125,0...
...0,-.125,-.25,.75,-.25,-.125,0...U
IP
Limit functions
����
����
Figure 1.20: Starting with the delta sequence on the bottom wire, the update box make a con-
tribution to the even wire, which in turn causes further changes in the odd wire after prediction.
The result is the sequence at the end which corresponds to the indicated limit functions. The
average of the coe�cients is zero, as expected.
depending on the position of the original 1. The update box now combines these sequences to
build a complementary sequence with average zero. We propose an update which adds a fraction
52
A of the odd element into the two neighboring evens. This leads to a complementary sequence
of the form
f0; 0; 1; 0; 0g �A f1=2; 1; 1=2; 0; 0g �A f0; 0; 1=2; 1; 1=2g:
Choosing A = 1=4 leads to the complementary sequence
f�1=8;�1=4; 3=4;�1=4;�1=8g;
which has average zero as desired.
Other types of update can be used as well. They ensure that not only the average but also
the �rst ~N moments of the sequences are preserved, i.e.,
mp = 2�j2j�1Xl=0
lp sj;l (1.5)
is independent of j for 0 � p < ~N . The update weights can be found easily as well. In [27] it is
shown that as long as ~N � N one can simply take the same weights for update as one used for
prediction, divided by two. For example, in case the predictor is of order N � 4, one can get an
update with ~N = 4 by letting
sj�1;l = sj;2l � 1=32 dj�1;l�2 + 9=32 dj�1;l�1 + 9=32 dj�1;l � 1=32 dj�1;l+1:
1.8.3 Update for cubic B-splines
The update box for the B-splines can be found using the same reasoning as in the previous
section. The complementary sequence now is
f0; 0; 1=4; 1; 1=4; 0; 0g �A=8 f1; 4; 6; 4; 1; 0; 0g �A=8 f0; 0; 1; 4; 6; 4; 1g;
which leads to A = 3=8.
1.9 Wavelet Basis Functions�
In this section we formally establish the relationship between detail coe�cients and the wavelet
functions. In the earlier section on multiresolution analysis (Section 1.7) we described spaces Vj
which are spanned by the fundamental solutions of the subdivision process, the scaling functions
'j;l. Here we will examine the di�erences Vj+1nVj and the wavelets j;l(x) which span these
di�erences.
53
Consider an initial signal sn = fsn;l j l 2 Zg. We can associate a function sn(x) in Vn with
this signal:
sn(x) =Xl
sn;l'n;l(x):
Calculate one level of the wavelet transform as described in an earlier section. This yields coarser
coe�cients sn�1;l and coe�cients dn�1;l. The coarser coe�cients sn�1;l corresponds to a new
function in Vn�1:
sn�1(x) =Xl
sn�1;l'n�1;l(x):
Recall that 'n�l;l(x) is the function that results if we run the inverse transform on the sequence
�n�1;l. With this the above equation expresses the fact that the function sn�1(x) is the result of
\assembling" the functions associated with a 1 in each of the positions (n� 1; l), each weighted
by sn�1;l. At �rst sight it is unclear which function the signal dn�1 corresponds to. Somehow we
feel that it corresponds to the \di�erence" of the signals sn(x) and sn�1(x). To �nd the solution
we need to de�ne the wavelet function.
Consider a detail coe�cient d0;0 = 1 and set all other detail coe�cients d0;k with k 6= 0 to
zero. Now compute one level of an inverse wavelet transform. This corresponds to what we
did in Figures 1.18 and 1.19 when putting a single 1 on the lower wire. After a single step of
an inverse transform this yields coe�cients gl of a function in V1, e.g., gl = f0; 1; 0g in case of
linears without lifting in Figure 1.18 or gl = f�1=8;�1=4; 3=4;�1=4;�1=8g in the case of linears
with lifting. We de�ne this function to be the wavelet function (x) = 0;0(x). Thus:
0;0(x) =Xl
gl '1;l(x) =Xl
gl '(2x� l):
This wavelet function is often called the mother wavelet. All other wavelets j;k are obtained by
setting dj;k = 1 and dj;l with l 6= k to zero, calculating one level of the inverse wavelet transforms
and then subdividing ad in�nitum to get the corresponding function in Vj+1. Using an argument
similar to the case of scaling functions, one can show that all wavelets are translates and dilates
of the mother wavelet:
j;k(x) = (2jx� k):
Figure 1.21 shows several mother wavelets coming from interpolation, Figure 1.22 shows wavelets
resulting from average-interpolation, and Figure 1.23 shows a wavelet associated with cubic B-
spline scaling functions.
54
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
Figure 1.21: Wavelets from interpolating subdivision. Going from left to right, top to bottom
are the wavelets of order N = 2, 4, 6, and 8. Each time ~N = 2.
Now we can answer the question from the start of this section. We �rst de�ne the detail
function dn�1(x) to be
dn�1(x) =Xl
dn�1;l n�1;l:
Because of linear superposition of the wavelet transform, it follows that
sn(x) = sn�1(x) + dn�1(x) =Xl
sn�1;l 'n�1;l(x) +Xl
dn�1;l n�1;l:
In words, the function de�ned by the sequence sn;l is equal to the result of performing one
forward transform step, computing sn�1;l (upper wire) and dn�1;l (lower wire,) and then using
these coe�cients in a linear superposition of the elementary functions which result when running
an inverse transform on a single 1 on the top or bottom wire, in the respective position.
So what we expected is true: the detail function dn�1(x) is nothing but the di�erence between
the original function and the coarser version. As we will see later, there are many di�erent ways
55
-4 -3 -2 -1 0 1 2 3 4
-0.5
0.0
0.5
-4 -3 -2 -1 0 1 2 3 4
-0.5
0.0
0.5
-4 -3 -2 -1 0 1 2 3 4
-0.5
0.0
0.5
-4 -3 -2 -1 0 1 2 3 4
-0.5
0.0
0.5
Figure 1.22: Wavelets from average-interpolation. Going from left to right, top to bottom are
the wavelets which correspond to the orders N = 1, 3, 5, and 7. Each time ~N = 1.
-4 -3 -2 -1 0 1 2 3 4-0.5
0.0
0.5
Figure 1.23: Cubic B-spline wavelet
to de�ne a detail function dn�1. They typically depend on two things: �rst, the detail coe�cients
dn�1;l (which are computed in the forward transform) and secondly the wavelet functions n�1;l
56
(which are built during the inverse transform.)
The n-level wavelet decomposition of a function sn(x) is de�ned as
sn(x) = s0(x) +n�1Xj=0
dj(x):
The space Wj is de�ned to be the space that contains the di�erence functions:
Wj = spanf j;l(x) j l 2 Zg:
It then follows that Wj is a space complementing Vj in Vj+1
Vj+1 = Vj �Wj:
In an earlier section we de�ned the notion of order of a multiresolution analysis. If the order
is N , then the wavelet transform started from any polynomial p(x) = sn(x) of degree less than
N will only yield zero wavelet coe�cients dj;l. Consequently all detail functions dj(x) are zero
and all sj(x) with j < n are equal to p(x).
In this section we introduce the dual order ~N of a multiresolution analysis. We say that the
dual order is ~N in case the wavelets have ~N vanishing moments:Z +1
�1
xp j;l(x) dx = 0 for 0 � p < ~N:
Because of translation and dilation, if the mother wavelet has ~N vanishing moments, then all
wavelets do. As a results all detail functions dj(x) in a wavelet representation also have ~N
vanishing moments and all coarser versions sj(x) of a function sn(x) have the �rst ~N moments
independent of j: Z +1
�1
xp sj(x) dx =
Z +1
�1
xp sj+1(x) dx:
Remark: Readers who are familiar with the more traditional signal processing introduction to
wavelets will note that order and dual order relate to the localization of the signals in frequency.
Typically the coarser function sn�1(x) will contain the lower frequency band while the detail
function dn�1(x) contains the higher frequency band. The order of a MRA is related to the
smoothness of the scaling functions and thus to how much aliasing occurs from the lower band
to the higher band. The dual order corresponds to the cancellation of the wavelets and thus to
how much aliasing occurs from the higher band to the lower. In the lifting scheme, the predict
part ensures a certain order, while the update part ensures a particular dual order.
57
58
Chapter 2
Second Generation Wavelets
2.1 Introduction
In the �rst chapter we only considered the regular setting, i.e., all samples are equally spaced,
and subdivision always puts new samples in the middle between old samples. Consequently
a sample value sj;k \lives" at the location k 2�j and all the scaling functions and wavelets we
constructed were dyadic translates and dilates of one �xed \mother" function. We refer to
these as �rst generation wavelets. We described the construction of wavelets with the help
of the lifting scheme, which only uses techniques of the spatial domain. However, historically
�rst generation wavelets were always constructed in the frequency domain with the help of the
Fourier transform, see e.g. [9]. All the wavelets and scaling functions we described in the �rst
chapter can be derived with these classical methods. Using the lifting scheme though makes it
very straightforward to build wavelets and scaling functions in much more general settings in
which the Fourier transform is not applicable anymore as a construction tool.
In the following sections we consider more general settings such as boundaries, irregular sam-
ples, and arbitrary weight functions. These cases do not allow for wavelets which are translates
and dilates of one �xed function, i.e., the wavelets at an interval boundary are not just translates
of nearby wavelets. This lack of translation and dilation invariance requires new construction
tools, such as lifting, to replace the Fourier transform. Lifting constructions are performed en-
tirely in the spatial domain and can be applied in the more general, irregular settings. Even
though the wavelets which result from using the lifting scheme in the more general settings will
not be translates and dilates of one function anymore, they still have all the powerful properties
59
of �rst generation wavelets: fast transforms, localization, and good approximation. We therefore
refer to them as Second Generation Wavelets [26].
The purpose of the remainder of this chapter is to show that subdivision, interpolation, and
lifting taken together result in a versatile and straightforward to implement second generation
wavelets toolkit. All the algorithms can be derived via simple arguments involving little more
than the manipulation of polynomials (as already seen in the �rst chapter.)
We focus on three settings which lead to Second Generation Wavelets:
� Intervals: When working with �nite data it is desirable to have basis functions adapted
to life on an interval. This way no awkward solutions such as zero padding, periodization,
or re ection are needed. We point out that many wavelet constructions on the interval
already exist, see [1, 6, 3], but we would like to use the subdivision schemes adapted to
boundaries since they lead to more straightforward constructions and implementations.
� Irregular samples: In many practical applications, the samples do not necessarily live
on a regular grid. Resampling is fraught with pitfalls and may even be impossible. A basis
and transform adapted to the irregular grid is desired.
� Weighted inner products: Often one needs a basis adapted to a weighted inner product
instead of the regular L2 inner product. A weighted inner product of two functions f and
g is de�ned as
hf; gi =
Zw(x) f(x) g(x) dx;
where w(x) is some positive function. Weighted wavelets are very useful in the solution
of boundary value ODEs, see [25]. Also, as we will see later, they are useful in the
approximation of functions with singularities.
Obviously, we are also interested in combinations of these settings. Once we know how to handle
each separately, combined settings can be dealt with easily.
2.2 Interpolating Subdivision and Scaling Functions
2.2.1 Interval Constructions
Recall that interpolating subdivision assembles N = 2D coe�cients sj;k in each step. These
uniquely de�ne a polynomial p(x) of degree N � 1. This polynomial is then used to generate
60
Figure 2.1: Behavior of the cubic interpolating subdivision near the boundary. The midpoint
samples between k = 2; 3 and k = 1; 2 are una�ected by the boundary. When attempting to
compute the midpoint sample for the interval k = 0; 1 we must modify the procedure since there
is no neighbor to the left for the cubic interpolation problem. Instead we choose 3 neighbors to
the right. Note how this results in the same cubic polynomial as used in the de�nition of the
midpoint value k = 1; 2, except this time it is evaluated at 1=2 rather than 3=2. The procedure
clearly preserves the cubic reconstruction property even at the interval boundary and is thus the
natural choice for the boundary modi�cation.
one new coe�cient sj+1;l. The new coe�cient is located in the middle of the N old coe�cients.
When working on an interval the same principle can be used as long as we are su�ciently far
from the boundary. Close to the boundary we need to adapt this scheme. Consider the case
where one wants to generate a new coe�cient sj+1;l, but is unable to �nd the same number
of old samples sj;k to the left as to the right of the new sample, simply because they are not
available. The basic idea is then to choose, from the set of available samples sj;k, those N which
are closest to the new coe�cient sj+1;l.
To be concrete, take the interval [0; 1]. We have 2j + 1 coe�cients sj;k at locations k 2�j for
0 � k � 2j . The left most coe�cient sj+1;0 is simply sj;0. The next one, sj+1;1 is found by
constructing the interpolating polynomial to the points (xj;k; sj;k) for 0 � k < N and evaluating
it at xj+1;1. For sj+1;2 we evaluate the same polynomial p at xj+1;2. Similar constructions work
for the other N boundary coe�cients and the right side. Figure 2.1 shows this idea for a concrete
example.
2.2.2 Irregular Samples
The case of irregular samples can also be accommodated by observing that interpolating subdi-
vision does not require the samples to be on a regular grid. We can take an arbitrarily spaced
61
0 1 2 3 4 5 6 7 8-1
0
1
0 1 2 3 4 5 6 7 8-1
0
1
0 1 2 3 4 5 6 7 8-1
0
1
0 1 2 3 4 5 6 7 8-1
0
1
Figure 2.2: Examples of scaling functions a�ected by a boundary. Left to right top to bottom
scaling functions of cubic (N = 4) interpolation with k = 0; 1; 2; 3. Note how the boundary
scaling functions are still interpolating as one would expect.
set of points xj;k with xj+1;2k = xj;k and xj;k < xj;k+1. A coe�cient sj;k lives at the location
xj;k. The subdivision schemes can now be applied in a straightforward manner.
2.2.3 Weighted Inner Products
Interpolating subdivision does not involve an inner product, hence a weighted inner product
does not change the subdivision part. However, the update part does change since it involves
integrals of scaling functions, as we shall see soon. We postpone the details of this until after
the general section on computing update weights.
62
2.2.4 Scaling Functions
As in the �rst generation case, we de�ne the scaling function 'j;k to be the result of running
the subdivision scheme ad in�nitum starting from a sequence sj;k0 = �j;k0 . The main di�erence
with the �rst generation case is that, because of the irregular setting, the scaling functions are
not necessarily translates and dilates of each other. For example, Figure 2.2 shows the scaling
functions a�ected by the boundary. The main feature of the second generation setting is that
the powerful properties such as approximation order, the re�nement relation, and the connection
with wavelets remain valid. We summarize the main properties:
� The limit function of the subdivision scheme started at level j with coe�cients sj;k can be
written as
f =Xk
sj;k 'j;k:
� The scaling functions are compactly supported.
� The scaling functions are interpolating:
'j;k(xj;l) = �k;l:
� Polynomials upto degree N � 1 can be written as linear combinations of the scaling func-
tions at level j. Because of the interpolating property, the coe�cients are samples of the
polynomial: Xk
xp
j;k'j;k = xp for 0 � p < N:
� Very little is known about the smoothness of the resulting scaling functions in the irreg-
ular case. Recall that they are de�ned as the limit of a fully nonstationary subdivision
scheme. Work in progress though suggests that with some very reasonable conditions on
the weight function or the placement of the sample locations, one can obtain roughly the
same smoothness as in the regular case.
� They satisfy re�nement relations. Start the subdivision on level j with sj;k = �j;k. We know
that the subdivision scheme converges to 'j;k. Now do only one step of the subdivision
scheme. Call the resulting coe�cients hj;k;l = sj+1;l. Only a �nite number are non zero.
63
Since starting the subdivision scheme at level j + 1 with the fhj;k;l j lg coe�cients also
converges to 'j;k, we have that
'j;k(x) =Xl
hj;k;l 'j+1;l(x): (2.1)
Note how in this case the coe�cients of the re�nement relation are (in general) di�erent
for each scaling function. Compare this with the �rst generation setting of Equation (1.2),
where hj;k;l = hl�2k. In that setting the hj;k;l are translation and dilation invariant.
Depending on the circumstances there are two basic ways of performing this subdivision process.
The more general way is to always construct the respective interpolating polynomials on the y
using an algorithm such as Neville. This has the advantage that none of the sample locations
have to be known beforehand.
However, in case the sample locations are �xed and known ahead of time, one can precompute
the subdivision, or �lter, coe�cients. These will be the same as the ones from the re�nement
relation: assume we are given the samples fsj;k j kg and we want to compute the fsj+1;l j lg.
Given that the whole process is linear, we can simply use superposition. A 1 at location (j; k)
would, after subdivision, give the sequence fhj;k;l j lg. Superposition now immediately leads to:
sj+1;l =Xk
hj;k;l sj;k: (2.2)
This is an equivalent formulation of the subdivision. It requires the precomputation of the hj;k;l.
These can be found once o�ine by using the polynomial interpolation algorithm of Neville.
Compare Equations (2.1) and (2.2) which use the same coe�cients hj;k;l. In (2.1) the sum-
mation ranges over l and the equation allows us to go from a �ne level scaling function to a
coarse level scaling function. In (2.2) the summation ranges over k and the equation allows us
to go from coarse level samples to �ne level samples.
2.3 The Unbalanced Haar Transform
Before we discuss how average-interpolation works in the second generation setting, we �rst take
a look at an example. The Unbalanced Haar transform is the generalization of the Haar wavelet
to the second generation setting [17]. The �rst di�erence with the interpolating case is that a
coe�cient sj;k does not \live" at location xj;k any more, but rather on the interval [xj;k; xj;k+1].
64
We de�ne the generalized length of an interval [xj;k; xj;k+1] as
Ij;k =
Z xj;k+1
xj;k
w(x) dx;
where w(x) is some weight function. From the de�nition, it immediately follows that
Ij�1;k = Ij;2k + Ij;2k+1: (2.3)
With this de�nition Ij;k measures the weight given to a particular interval and its associated
coe�cient sj;k. Given a signal sj, the important quantity to preserve is not so much the average
of the coe�cients, but their weighted average:
Xk
Ij;k sj;k
With this in mind we can de�ne the generalization of the Haar transform, which is called the
Unbalanced Haar Transform. The detail wavelet coe�cient is still computed as before:
dj�1;k = sj;2k+1 � sj;2k;
but the average is now computed as a weighted average:
sj�1;k =Ij;2k sj;2k + Ij;2k+1 sj;2k+1
Ij�1;k:
De�ning the transform this way assures two things:
1. If the original signal is a constant, then all detail coe�cients are zero and all coarser versions
are constants as well. This follows from the de�nition of the transform and Equation (2.3).
2. The weighted average of all coarser signals is the same, i.e.
Xl
Ij;l sj;l
does not depend on j. This follows from computing the coarser signals as weighted aver-
ages.
Later we will see that the order of the Unbalanced Haar MRA as well as its dual order are one.
65
Next we have to cast this in the lifting framework of split, predict, and update. The split
divides the signal in even and odd indexed coe�cients. The prediction for an odd coe�cient
sj;2k+1 is its left neighboring even sample sj;2k which leads to the detail coe�cient:
dj�1;k = sj;2k+1 � sj;2k:
In the update step, the coarser level is computed as [26]
sj�1;k = sj;2k +Ij;2k+1
Ij�1;kdj�1;k:
Using Equation (2.3), one can see that this is equivalent to the weighted average computation.
2.4 Average-Interpolating Subdivision
In this section we discuss how average-interpolation works in the second generation case as
introduced in [25]. The setting is very similar to the �rst generation case. We �rst describe the
average-interpolating subdivision scheme and then show how this �ts into the lifting strategy.
We start by assuming that we are given weighted averages of some unknown function over
intervals
sn;l =1
In;l
Z xn;l+1
xn;l
w(x) f(x) dx:
Just as before we de�ne average-interpolating subdivision through the use to higher order poly-
nomials, with the �rst interesting choice being quadratic. For a given interval consider the
intervals to its left and right. De�ne the (unique) quadratic polynomial p(x) so that
sj;k�1 =1
Ij;k�1
Z xj;k
xj;k�1
w(x) p(x) dx
sj;k =1
Ij;k
Z xj;k+1
xj;k
w(x) p(x) dx
sj;k+1 =1
Ij;k+1
Z xj;k+2
xj;k+1
w(x) p(x) dx:
Now compute sj+1;2k and sj+1;2k+1 as the average of this polynomial over the left and right
subintervals of [xj;k; xj;k+1]
sj+1;2k =1
Ij+1;2k
Z xj+1;2k+1
xj+1;2k
w(x) p(x) dx
sj+1;2k+1 =1
Ij+1;2k+1
Z xj+1;2k+2
xj+1;2k+1
w(x) p(x) dx:
66
It is easy to see that the procedure will reproduce quadratic polynomials. Assume that the
initial averages fs0;kg were weighted averages of a given quadratic polynomial P (x). In that
case the unique polynomial p(x) which has the prescribed averages over each triple of intervals
will always be that same polynomial P (x) which gave rise to the initial set of averages. Since the
interval sizes go to zero and the averages over the intervals approach the value of the underlying
function in the limit the original quadratic polynomial P (x) will be reproduced.
Higher order schemes can be constructed similarly. We construct a polynomial p of degree
N � 1 (where N = 2D + 1) so that
sj;k+l =1
Ij;k+l
Z xj;k+l+1
xj;k+l
w(x) p(x) dx for �D � l � D;
Then we calculate two coe�cients on the next �ner level as
sj+1;2k =1
Ij+1;2k
Z xj+1;2k+1
xj+1;2k
w(x) p(x) dx
sj+1;2k+1 =1
Ij+1;2k+1
Z xj+1;2k+2
xj+1;2k+1
w(x) p(x) dx:
The computation is very similar to the �rst generation case, except for the fact that the poly-
nomial problem cannot be recast into a Neville algorithm any longer since the integral of a
polynomial times the weight function is not necessarily a polynomial.
These algorithms take care of the weighted setting and the irregular samples setting. In the
case of an interval construction, we follow the same philosophy as in the interpolating case. We
need to assemble N coe�cients to determine an average-interpolating polynomial. In case we
cannot align them symmetrically around the new samples, as at the end of the interval, we
simply take more from one side than the other. This idea is illustrated in Figure 2.3.
Next we cast average-interpolation into the lifting framework. We use the average-interpolating
subdivision as a P box before entering the inverse Unbalanced Haar transform. The diagram
in Figure 1.9 illustrates this setup. Instead of computing sj+1;2k and sj+1;2k+1 directly we will
compute their di�erence dj;k = sj+1;2k+1 � sj+1;2k and feed this as a di�erence signal into the
inverse Unbalanced Haar transform. Given that the weighted average of sj+1;2k and sj+1;2k+1, as
computed by average-interpolation is sj;k, it follows that the inverse Unbalanced Haar transform
when given sj;k and dj;k will compute sj+1;2k and sj+1;2k+1 as desired.
67
Figure 2.3: Behavior of the quadratic average-interpolation process near the boundary. The
averages for the subintervals k = 2 and k = 1 are una�ected. When attempting to compute
the �ner averages for the left most interval the procedure needs to be modi�ed since no further
average to the left of k = 0 exists for the average-interpolation problem. Instead we use 2
intervals to the right of k = 0, e�ectively reusing the same average-interpolating polynomial
constructed for the subinterval averages on k = 1. Once again it is immediately clear that this
is the natural modi�cation to the process near the boundary, since it insures that the crucial
quadratic reproduction property is preserved.
2.5 Average-Interpolating Scaling Functions
As before an average-interpolating scaling function 'j;k(x) is de�ned as the limit function of the
subdivision process started on level j with the sequence �j;k. We here list the main properties:
� The limit function of the subdivision scheme started at level j with coe�cients sj;k can be
written as
f =Xk
sj;k 'j;k:
� The scaling functions are compactly supported.
� The scaling functions are average-interpolating:
1
Ij;l
Z xj;l+1
xj;l
w(x)'j;k(x) dx = �k;l:
� The scaling functions of level j reproduce polynomials upto degree N � 1.
Xk
cj;k 'j;k = xp for 0 � p < N
68
0 1 2 3 4 5 6 7 8-1.0
0.0
1.0
0 1 2 3 4 5 6 7 8-1.0
0.0
1.0
0 1 2 3 4 5 6 7 8-1.0
0.0
1.0
0 1 2 3 4 5 6 7 8-1.0
0.0
1.0
Figure 2.4: Examples of scaling functions a�ected by a boundary. Left to right, top to bottom
scaling functions of quadratic (N = 3) average-interpolation at k = 0; 1; 2; 3. The functions
continue to have averages over each integer subinterval of either 0 or 1.
with coe�cients
cj;k =1
Ij;k
Z xj;k+1
xj;k
w(x)xp 'j;k(x) dx:
� The scaling functions satisfy re�nement relations:
'j;k =Xl
hj;k;l 'j+1;l: (2.4)
Figure 2.4 shows the average-interpolating boundary functions.
2.6 Cubic B-spline Scaling Functions
In the case of cubic B-splines we need to worry about the endpoints of a �nite sized interval.
Because of their support the scaling functions close to the endpoints would overlap the outside of
69
the interval. This issue can be addressed in a number of di�erent ways. One treatment, used by
Chui and Quak [3], uses multiple knots at the endpoints of the interval. The appropriate subdi-
vision weights then follow from the evaluation of the de Boor algorithm for those control points.
The total number of scaling functions at level j becomes 2j + 3 in this setting. Consequently it
is not so easy anymore to express everything in a framework which is based on insertion of new
control points inbetween old ones. We used a di�erent treatment which preserves this property.
The Podd box remains as before at the boundary|every odd location still has an even neighbor
on either side|but we change the Peven box since the left- (right-) most even position has only
one odd neighbor. In this case the Peven box makes no contribution to the boundary control
point and furthermore the boundary control point does not get rescaled. This leads to endpoint
interpolating piecewise cubic polynomial scaling functions as shown in Figure 2.5.
0 1 2 3 4 5 6 7 8-1
0
1
0 1 2 3 4 5 6 7 8-1
0
1
Figure 2.5: In the case of cubic B-splines only the two leftmost splines change for the particular
adaptation to the boundary which we chose.
2.7 Multiresolution Analysis
Now that we have de�ned subdivision and scaling functions in the second generation setting, it is
a small step to multiresolution analysis. Remember that the result of the subdivision algorithm
started from level j can always be written as a linear combination of scaling functions.
sj(x) =Xk
sj;k 'j;k(x):
The de�nition of the Vj spaces is now exactly the same as in the �rst generation case:
Vj = span f'j;k j 0 � k < Kjg:
70
We assume Kj scaling functions on level j. It follows from the re�nement relations (2.1) that
the spaces are nested:
Vj � Vj+1:
Again we want that any function of �nite energy (2 L2) can be approximated arbitrarily closely
with scaling functions. Mathematically we write this as[j>0
Vj is dense in L2:
The order of the MRA is de�ned similarly to the �rst generation case. We say that the order
is N in case every polynomial of degree less than N can be written as a linear combination
of scaling functions on a given level. The subdivision schemes we saw in the previous sections
have order N where N is odd for interpolating subdivision and even for average-interpolating
subdivision.
Integrals of Scaling Functions We will later see that in order to build wavelets, it is im-
portant to know the integral of each scaling function. In the �rst generation case this is not an
issues. Due to translation and dilation the integral of 'j;l(x) is always 2�j . In the second gen-
eration case, the integrals are not given by such a simple rule. Therefore we need an algorithm
to compute them. We de�ne:
Mj;k :=
Z +1
�1
w(x)'j;k(x) dx:
The computation goes in two phases. We �rst approximate the integrals on the �nest level n
numerically using a simple quadrature formula. The ones on the coarser levels j < n can be
computed iteratively. From integrating the re�nement relation (2.1) it immediately follows that
Mj;k =Xl
hj;k;lMj+1;l:
Once we computed the hj;k;l coe�cients, the recursive computation of the integrals of the scaling
functions is straightforward. We will later need them in the update stage.
2.8 Lifting and Interpolation
In this section, we discuss how to use the lifting scheme to compute second generation wavelet
transforms. The steps will be exactly the same as in the �rst generation case: split, predict,
update.
71
The split stage again is the Lazy wavelet transform. It simply consists of splitting the samples
sj;l into the even indexed samples sj;2k and the odd indexed samples sj;2k+1.
In the predict stage we take the even samples and use interpolating subdivision to predict
each odd sample. The detail coe�cient is the di�erence of the odd sample and its predicted
value. Suppose we use a subdivision scheme of order N = 2D to build the predictor. The detail
coe�cient is then computed as
dj�1;k := sj;2k+1 � p(xj;2k+1);
where p(x) is the interpolating polynomial of degree N � 1 which interpolates the points
(xj�1;k+l; sj�1;k+1) with �D + 1 � l � D. Thus if the original signal is a polynomial of degree
strictly less than N , the detail signal is exactly zero.
The purpose of the update stage is to preserve the weighted average on each level; we want
that Z +1
�1
Xk
sj;k 'j;k(x) dx =Xk
sj;kMj;k
does not depend on the level j. According to the lifting scheme we do this using the detail
computed in the previous step. We propose an update step of the form:
sj�1;k := sj;2k +Aj;k�1 dj;k�1 +Aj;k dj;k: (2.5)
In order to �nd the Aj:k, assume we run the inverse transform from level j � 1 to j starting
with all sj�1;l zero and only one dj�1;k non-zero. Then undoing the update and running the
subdivision scheme should result in a function with integral zero. Undoing the update will result
in two non zero even coe�cients, sj;2k and sj;2k+2. Undoing the update involves computing:
sj;2k := sj�1;k �Aj;k�1 dj;k�1 �Aj;k dj;k
sj;2k+2 := sj�1;k+1 �Aj;k dj;k �Aj;k+1 dj;k+1:
Given that only dj;k is non zero, we have that sj;2k = �Aj;k and sj;2k+2 = �Aj;k. Now running
the subdivision scheme results in a function given by
'j;2k+1(x)�Aj;k 'j�1;k(x)�Aj;k 'j�1;k+1(x): (2.6)
This function has to have integral zero. Thus:
Aj;k =Mj;2k+1
Mj�1;k +Mj�1;k+1:
72
This shows us how to choose Aj;k.
One can also build more powerful update methods, which not only assure that the integral
of the sj(x) functions is preserved, but also their �rst (generalized) moment:
Z +1
�1
w(x)x sj(x) dx:
This requires the calculation of the �rst order moments of the scaling functions, which can be
done analogously to the integral calculations. The lifting (2.6) then has di�erent lifting weights
for 'j�1;k and 'j�1;k+1 (as opposed to Aj;k for both) which can be found by solving a 2 � 2
linear system.
2.9 Wavelet functions
0 1 2 3 4 5 6 7 8-0.5
0.0
0.5
0 1 2 3 4 5 6 7 8-0.5
0.0
0.5
0 1 2 3 4 5 6 7 8-0.5
0.0
0.5
0 1 2 3 4 5 6 7 8-0.5
0.0
0.5
Figure 2.6: Examples of wavelets a�ected by a boundary. Going left to right, top to bottom
wavelets with ~N = 2 vanishing moments and order N = 4 for k = 0; 1; 2; 3.
73
0 1 2 3 4 5 6 7 8
-0.5
0.0
0.5
0 1 2 3 4 5 6 7 8
-0.5
0.0
0.5
0 1 2 3 4 5 6 7 8
-0.5
0.0
0.5
0 1 2 3 4 5 6 7 8
-0.5
0.0
0.5
Figure 2.7: Examples of wavelets a�ected by a boundary. Going left to right, top to bottom
wavelets with ~N = 1 vanishing moment and order N = 3 for k = 0; 1; 2; 3.
We can now de�ne wavelet functions exactly the same way as in the �rst generation case.
Compute an inverse wavelet transform from level j to level j+1 with all coarse scale coe�cients
sj;l set to zero and only one detail coe�cient dj;k set to one while the other detail coe�cients are
zero. Call the resulting sequence fgj;k;l j lg. Then run the subdivision algorithm. The resulting
function is the wavelet j;k:
j;k =Xl
gj;k;l 'j+1;l: (2.7)
Figures 2.6, 2.7, and 2.8 show the wavelets a�ected by the boundary in the interpolating, average-
interpolation, and cubic B-spline case respectively. In the interpolating case N = 4 and the
wavelets, built with lifting, have ~N = 2 vanishing moments. In the average-interpolation case
N = 3 and the wavelets have ~N = 1. For the B-splines the wavelets have ~N = 2 vanishing
moments.
74
0 1 2 3 4 5 6 7 8-0.5
0.5
0 1 2 3 4 5 6 7 8-0.5
0.5
0 1 2 3 4 5 6 7 8-0.5
0.5
Figure 2.8: The two left most (top row) wavelets are in uenced by the boundary. Thereafter
they default to the usual B-spline wavelets (bottom left).
De�ne the detail function
dj(x) =Xk
dj;k j;k(x):
It then follows that
sj+1(x) = sj(x) + dj(x):
The multiresolution representation of a function sn(x) can now be written as
sn(x) = s0(x) + d0(x) + d1(x) + : : : dn�1(x) = s0(x) +n�1Xj=0
Xk
dj;k j;k:
The main advantage of the wavelet transform is the fact that the expected value of the detail
coe�cient magnitudes is much smaller than the original samples. This is how we obtain a more
compact representation of the original signal.
75
Remember that the order of a multiresolution analysis is N if the multiresolution represen-
tation of any polynomial of degree strictly less than N yields only zero detail signals. In other
words if sn(x) = p(x) is a polynomial of degree less than N , then sn(x) = s0(x) and all details
are identically zero.
Another quantity which characterizes a multiresolution analysis is its dual order. We say
that the dual order is ~N in case all wavelets have ~N vanishing moments orZ +1
�1
w(x)xp j;k(x) dx = 0 for 0 � p < ~N:
We only consider the case where ~N = 1. The same techniques can be used for the more general
case if so needed. Consequently all detail signals have a vanishing integral,Z +1
�1
w(x) dj(x) dx = 0:
This is equivalent to saying that Z +1
�1
w(x) sj(x) dx
is independent of the level j.
We de�ne the subspaces Wj as
Wj = spanf j;k j 0 � k < Kj+1 �Kjg;
then
Vj+1 = Vj �Wj :: (2.8)
The dimension of Wj is thus the dimension of Vj+1 (Kj+1) minus the dimension of Vj (Kj).
2.10 Applications
In this section we describe results of some experiments involving the ideas presented earlier. The
examples were generated with a simple C code whose implementation is a direct transliteration of
the algorithms described above. The only essential piece of code imported was an implementation
of Neville's algorithm from Numerical Recipes [21]. All examples were computed on the unit
interval, that is all constructions are adapted to the boundary as described earlier. The only
code modi�cation to accommodate this is to insure that the moving window of coe�cients does
not cross the left or right end point of the interval. The case of a weight function required
somewhat more machinery which we describe in that section.
76
0.0 0.2 0.4 0.6 0.8 1.0-2.0
-1.0
0.0
1.0
2.0
0.0 0.2 0.4 0.6 0.8 1.0-2.0
-1.0
0.0
1.0
2.0
Figure 2.9: Example of scaling functions (top) with N = 4 (interpolating subdivision) and
wavelets (bottom) with ~N = 2 vanishing moments adapted to irregular sample locations. The
original sample locations x3;k are highlighted with diamond marks. Note that one of the scaling
functions is 1 at each of the marks, while all others are 0.
77
2.10.1 Interpolation of Randomly Sampled Data
The �rst and simplest generalization concerns the use of xj0;k placed at random locations.
Figure 2.9 shows the scaling functions (top) and wavelets (bottom) which result for such a set of
random locations. The scaling functions are of order N = 4 (interpolating subdivision) and the
wavelets have ~N = 2 vanishing moments. In this case we placed 7 uniformly random samples
between x3;0 = 0 and x3;8 = 1. These locations are discernible in the graph as the unique points
at which all scaling functions have a root save for one which takes on the value 1 (indicated by
solid diamond marks). Sample points at �ner levels were generated recursively by simply adding
midpoints, i.e., xj+1;2k+1 = 1=2 (xj;k + xj;k+1) for j > 3.
An interesting question is how the new sample points should be placed. A disadvantage of
always adding midpoints is that imbalances between the lengths of the intervals are maintained.
A way to avoid this is to place new sample points only in intervals whose length is larger than the
average interval length. Doing so repeatedly will bring the ratio of largest to smallest interval
length ever closer to 1.
Another possible approach would add new points so that the length of the intervals varies in
a smooth manner, i.e., no large intervals neighbor small intervals. This can be done by applying
an interpolating subdivision scheme, with integers as sample locations, to the xj;k themselves
to �nd the xj+1;2k+1. This would result in a smooth mapping from the integers to the xj;k.
After performing this step the usual interpolating subdivision would follow. Depending on the
application one of these schemes may be preferable.
Next we took some random data over a random set of 16 sample locations and applied linear
(N = 2) and cubic (N = 4) interpolating subdivision to them. The resulting interpolating
functions are compared on the right side of Figure 2.10. These functions can be thought of as
a linear superposition of the kinds of scaling functions we constructed above for the example
j = 3.
Note how sample points which are very close to each other can introduce sharp features in the
resulting function. We also note that the interpolation of order 4 exhibits some of the overshoot
behavior one would expect when encountering long and steep sections of the curve followed by
a reversal of direction. This behavior gets worse for higher order interpolation schemes. These
experiments suggest that it might be desirable to enforce some condition on the ratio of the
largest to the smallest interval in a random sample construction.
78
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
Figure 2.10: Example of data de�ned at random locations x4;k on the unit interval and inter-
polated with interpolating subdivision of order N = 2 and 4 respectively.
2.10.2 Smoothing of Randomly Sampled Data
A typical use for wavelet constructions over irregular sample locations is smoothing of data
acquired at such locations. As an example of this we took 512 uniformly random locations on
the unit interval (V9) and initialized them with averages of sin(3=4�x) with �20% additive white
noise. The resulting function is plotted on the top left of Figure 2.11 at level 9. The scaling
functions used were based on average-interpolation with N = 5 and ~N = 1. Smoothing was
performed by going to coarser spaces (lower index), setting all wavelet coe�cients to zero and
subdividing back out. From left to right, top to bottom the coarsest level used in the transform
is V9, V7, V5, and V3.
We hasten to point out that this is is a very simple and naive smoothing technique. Depending
on the application and knowledge of the underlying processes much more powerful smoothing
operators can be constructed [15, 14]. This example merely serves to suggest that such operations
can also be performed over irregular samples.
2.10.3 Weighted Inner Products
When we discussed the construction of scaling functions and wavelets we pointed out how a
weight function in the inner product can be incorporated in the transform. The only change
in the code is due to the fact that we cannot cast the average-interpolation problem into the
form of a Neville interpolation algorithm anymore, since in general the integral of a polynomial
79
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
Figure 2.11: A sine wave with additive noise sampled at uniformly distributed random locations
in the unit interval and reconstructed with quintic average-interpolation (upper left). Successive
smoothings are performed by going to coarser resolutions and cascading back out (upper right
and bottom row).
times the weight function is not another polynomial. Instead we �rst explicitly construct the
polynomial p in the subdivision and use it to �nd the �lter coe�cients. This implies solving
the underlying linear system which relates the coe�cients of p(x) to the observed weighted
averages. Similarly, when lifting the interpolating wavelets to give them 2 vanishing moments
the weighted moments of the scaling function enter. In both of these cases the construction
of weighted bases requires additional code to compute moments and solve the linear systems
involved in �nding the �lters. We saw in Section 2.7 how moment and integral calculations can
80
0.0 0.2 0.4
-1.0
-0.5
0.0
0.5
0.0 0.2 0.4-1.0
0.0
1.0
2.0
Figure 2.12: Comparison of weighted (solid line) and unweighted (dotted line) wavelets at the
left endpoint of the interval where the weight function x�1=2 becomes singular. On the left N = 4
and ~N = 2; on the right N = 5 and ~N = 1. Note how the weighted wavelets take on smaller
values at zero in order to adapt to the weight function whose value tends to in�nity.
be performed recursively from the �nest level on up by using the re�nement relationship for the
scaling function. Without going into any further detail we point out that moment calculations
and the solution of the linear system to �nd p(x) can be numerically delicate. The stability
depends on which polynomial basis is used. For example, we found the linear systems that
result when expressing everything with respect to global monomial moments so ill-conditioned
as to be unsolvable even in double precision. The solution lies in using a local polynomial, i.e.,
a basis which changes for each interval. A better choice might be a basis of local orthogonal
polynomials.
In our experiments we used the weight function x�1=2 which is singular at the left interval
boundary. To compute the moments we used local monomials, resulting in integrals for which
analytic expressions are available.
Figure 2.12 shows some of the resulting wavelets. In both cases we show the left most wavelet,
which is most impacted by the weight function. Weighted and unweighted wavelets further to the
right become ever more similar. Part of the reason why they look similar is the normalization.
For example, both weighted and unweighted scaling functions have to satisfyP
k 'j;k = 1. The
images show wavelets with N = 4 (interpolating) and ~N = 2 vanishing moments on the left
and wavelets N = 5 (average-interpolation) and ~N = 1 vanishing moment on the right. In both
cases the weighted wavelet is shown with a solid line and the unweighted case with a dotted line.
81
1E+
011E
+02
unweighted case
1E-03
1E-02
1E-01
1E+
00
relative l2 error
N =
1N
= 3
N =
5N
= 7
1E+
011E
+02
weighted case
1E-03
1E-02
1E-01
1E+
00
relative l2 error
N =
1N
= 3
N =
5N
= 7
Figure2.13:Compariso
nofapproxim
atio
nerro
rwhen
expandingthefunctio
nsin
(4�x1=2)
over[0;1=2]
usin
gwaveletsconstru
cted
with
respectto
anunweigh
tedinner
prod
uct
(left)anda
weigh
tedinner
prod
uctwith
weigh
tx�1=2(righ
t).Here
N=1,3,5,and7.
Theweigh
tedandunweigh
tedwavelets
areonly
slightly
di�eren
tin
shape.
How
ever,when
applied
totheexpansion
ofsom
efunction
they
canmake
adram
aticdi�eren
ce.Asan
exam
ple
weapplied
both
types
ofwavelets
tothefunction
f(x)=
sin(4�x1=2),
which
has
adivergen
t
derivative
atzero.
With
unweigh
tedwavelets
thecon
vergence
willbeslow
closeto
thesin
gularity,
typically
O(h)with
h=2�jindependentofN.In
other
word
s,there
isnogain
inusin
ghigh
er
order
wavelets.
How
ever,ifwebuild
weigh
tedwavelets
forwhich
theweigh
tfunction
timesf
isan
analy
ticfunction
,wecan
expect
O(h
N)behavior
everywhere
again.For
ourexam
plewe
cantake
w(x)=x�1=2.
Thisway
theweigh
tedwavelets
areadapted
tothesin
gularity
ofthe
function
f.Figu
re2.13
show
stheerror
intheresu
ltingexpansion
swith
orderN
=1,3,5,and
7anddual
order
~N=
1.For
unweigh
tedwavelets
high
erord
ercon
struction
sonly
getbetter
byacon
stantfactor,
while
theweigh
tedwavelets
show
high
erord
ercon
vergence
when
goingto
high
erord
erwavelets.
2.11
Warning
Like
every\doit
yourself
athom
e"product
this
onecom
eswith
awarn
ing.
Most
ofthe
techniques
wepresen
tedhere
arestraigh
tforward
toim
plem
entandbefore
youknow
ityou
will
82
be generating wavelets yourself. However, we did not discuss most of the deeper underlying
mathematical properties which assure that everything works like we expect it to. These address
issues such as: What are the conditions on the subdivision scheme so that it generates smooth
functions? or: Do the resulting scaling functions and wavelets generate a stable, i.e., Riesz
basis? These questions are not easily answered and require some heavy mathematics. One of
the fundamental questions is how properties, such as convergence of the subdivision algorithm,
Riesz bounds, and smoothness, can be related back to properties of the �lter sequences. This is
a very hard question and at this moment no general answer is available to our knowledge.
We restrict ourselves here to a short description of the extent to which these questions have
been answered. In the classical case, i.e., regular samples and no weight function, everything
essentially works. The regularity of the basis functions varies linearly with N . In the case
of the interval, regular samples, and no weight function, again the same results hold. This is
because the boundary basis functions are �nite linear combinations of the ones from the real
line. In the case of regular samples with a weight function, it can be shown that with some
minimal conditions on the weight function, the basis functions have the same regularity as in
the unweighted case. In the case of irregular samples, little is known at this moment. Everything
essentially depends on how irregular the samples are. It might be possible to obtain results under
the conditions that the irregular samples are not too far from the regular samples, but this has
to be studied in detail in the future.
Recent results concerning general multiscale transforms and their stability were obtained by
Wolfgang Dahmen and his collaborators. They have been working (independently from [26, 27])
on a scheme which is very similar to the lifting scheme [2, 8]. In particular, Dahmen shows
in [7] which properties in addition to invertibility of the transform are needed to assure stable
bases. Whether this result can be applied to the bases constructed here needs to be studied in
the future.
2.12 Outlook
So far we have only discussed the construction of second generation wavelets on the real line or
the interval. Most of the techniques presented here such as polynomial subdivision and lifting
extend easily to much more general sets. In particular domains in Rn, curves, surfaces, and
manifolds.
One example is the construction of wavelets on the sphere [22]. There we use the lifting
83
scheme to construct locally supported, biorthogonal spherical wavelets and their associated
fast transforms. The construction starts from a recursive triangulation of the sphere and is
parameterization independent. Since the construction does not rely on any speci�c properties
of the sphere it can be generalized to other surfaces. The only question which needs to be
addressed is what the right replacement for polynomials is. Polynomials restricted to a sphere
are still a natural choice because of the connection with spherical harmonics, but on a general
surface this is no longer the case.
84
Bibliography
[1] L. Andersson, N. Hall, B. Jawerth, and G. Peters. Wavelets on closed subsets of the real
line. In [23], pages 1{61.
[2] J. M. Carnicer, W. Dahmen, and J. M. Pe~na. Local decompositions of re�nable spaces.
Technical report, Insitut f�ur Geometrie und angewandete Mathematik, RWTH Aachen,
1994.
[3] C. Chui and E. Quak. Wavelets on a bounded interval. In D. Braess and L. L. Schumaker,
editors, Numerical Methods of Approximation Theory, pages 1{24. Birkh�auser-Verlag, Basel,
1992.
[4] C. K. Chui, L. Montefusco, and L. Puccio, editors. Conference on Wavelets: Theory,
Algorithms, and Applications. Academic Press, San Diego, CA, 1994.
[5] A. Cohen, I. Daubechies, and J. Feauveau. Bi-orthogonal bases of compactly supported
wavelets. Comm. Pure Appl. Math., 45:485{560, 1992.
[6] A. Cohen, I. Daubechies, and P. Vial. Multiresolution analysis, wavelets and fast algorithms
on an interval. Appl. Comput. Harmon. Anal., 1(1):54{81, 1993.
[7] W. Dahmen. Stability of multiscale transformations. Technical report, Institut f�ur Geome-
trie und angewandete Mathematik, RWTH Aachen, 1994.
[8] W. Dahmen, S. Pr�ossdorf, and R. Schneider. Multiscale methods for pseudo-di�erential
equations on smooth manifolds. In [4], pages 385{424.
[9] I. Daubechies. Ten Lectures on Wavelets. CBMS-NSF Regional Conf. Series in Appl. Math.,
Vol. 61. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1992.
85
[10] G. Deslauriers and S. Dubuc. Interpolation dyadique. In Fractals, dimensions non enti�eres
et applications, pages 44{55. Masson, Paris, 1987.
[11] G. Deslauriers and S. Dubuc. Symmetric iterative interpolation processes. Constr. Approx.,
5(1):49{68, 1989.
[12] D. L. Donoho. Smooth wavelet decompositions with blocky coe�cient kernels. In [23],
pages 259{308.
[13] D. L. Donoho. Interpolating wavelet transforms. Preprint, Department of Statistics, Stan-
ford University, 1992.
[14] D. L. Donoho and I. M. Johnstone. Ideal spatial adaptation via wavelet shrinkage.
Biometrika, to appear, 1994.
[15] D. L. Donoho and I. M. Johnstone. Adapting to unknown smoothness via wavelet shrinkage.
1995.
[16] N. Dyn, J. A. Gregory, and D. Levin. A four-point interpolatory subdivision scheme for
curve design. Computer Aided Geometric Design, (4):257{268, 1987.
[17] M. Girardi and W. Sweldens. A new class of unbalanced Haar wavelets that form an
unconditional basis for lp on general measure spaces. Technical Report 1995:2, Industrial
Mathematics Initiative, Department of Mathematics, University of South Carolina, 1995�.
[18] M. Lounsbery, T. D. DeRose, and J. Warren. Multiresolution surfaces of arbitrary topo-
logical type. Department of Computer Science and Engineering 93-10-05, University of
Washington, October 1993. Updated version available as 93-10-05b, January, 1994.
[19] S. G. Mallat. Multiresolution approximations and wavelet orthonormal bases of L2(R).
Trans. Amer. Math. Soc., 315(1):69{87, 1989.
[20] Y. Meyer. Ondelettes et Op�erateurs, I: Ondelettes, II: Op�erateurs de Calder�on-Zygmund,
III: (with R. Coifman), Op�erateurs multilin�eaires. Hermann, Paris, 1990. English transla-
tion of �rst volume, Wavelets and Operators, is published by Cambridge University Press,
1993.
[21] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes.
Cambridge University Press, 2nd edition, 1993.
86
[22] P. Schr�oder and W. Sweldens. Spherical wavelets: E�ciently representing functions on the
sphere. Computer Graphics Proceedings, (SIGGRAPH 95), pages 161{172, 1995.
[23] L. L. Schumaker and G. Webb, editors. Recent Advances in Wavelet Analysis. Academic
Press, New York, 1993.
[24] J. Stoer and R. Bulirsch. Introduction to Numerical Analysis. Springer Verlag, New York,
1980.
[25] W. Sweldens. Construction and Applications of Wavelets in Numerical Analysis. PhD
thesis, Department of Computer Science, Katholieke Universiteit Leuven, Belgium, 1994.
[26] W. Sweldens. The lifting scheme: A construction of second generation wavelets. Technical
Report 1995:6, Industrial Mathematics Initiative, Department of Mathematics, University
of South Carolina, 1995.
[27] W. Sweldens. The lifting scheme: A custom-design construction of biorthogonal wavelets.
Appl. Comput. Harmon. Anal., 3(2):186{200, 1996.
[28] M. Vetterli and C. Herley. Wavelets and �lter banks: theory and design. IEEE Trans.
Acoust. Speech Signal Process., 40(9):2207{2232, 1992.
87
88
Afternoon Section: Applications