Physics 213 Lecture Notes 10-22-03.txt
REVIEW FROM LAST LECTURE
- Introduced and illustrated the definitions of dissipative,
conservative, and expansive orbits X(t) of a flow X'(t)=F(X) in
terms of the time-averaged divergence of the Jacobian matrix:
< Div[F(X)] > = < Trace[Jacobian[F](X)] >
which is negative, zero, and positive respectively.
Historically, the concept of conservative
(energy-conserving) systems came first as the giants of
classical mechanics such as Lagrange and Hamilton worked
out deep ways to generalize Newton's laws of
motion. Later, their formalism was extended to
non-conserving systems and then it was quickly appreciated
that a key distinguishing feature was that infinitesimal
phase space volumes were no longer conserved. As with many
important mathematical definitions, the concept did not
appear out of thin air but grew out of thinking over
several centuries of theoretical reasoning.
- We applied this definition to several systems, e.g., to
a damped harmonic oscillator and to the Lorenz
equations. For problems which have a natural concept of
friction,
is negative because of that friction
but our definition is much more general and can be applied
to problems for which there is no natural concept of
energy or friction or forces.
- We derived the result that
(1/dV) d(dV)/dt = Div[F(X)] ,
that the divergence of the vector field F(X) driving the
flow gave the relative rate of change of infinitesimal
volumes in phase space.
DISSIPATION CRITERION FOR MAPS:
- Just as we defined phase space for a first-order
autonomous flow, we can define a phase space for any
N-dimensional map X[i+1] = G( X[i] ). Provided G(X) is a
smooth vector field as a function of the vector X, we
automatically get existence and uniqueness of orbits
through any initial condition X[0] in phase space.
Existence just follows from the fact that we can evaluate
G(X) explicitly. Uniqueness follows from the fact that
G(X) is a single-valued function, hence can produce only a
single output for any vector input.
- The main difference of orbits from maps, compared to
orbits from flows, is that the map values jump from one
point in phase space to another with successive
iterations. Periodic orbits of maps are really just cycles
in which a finite number of points are visited in some
order. Quasiperiodic or more complex (chaotic) orbits
correspond to continuous curves or sets of points that are
filled out in the limit of infinitely many
iterations.
Note that although one jumps from point to point in phase
space under a map, any given point is still a continuous
and differentiable function of the initial data, just as
for flows. This follows since:
X[i+1] = G( X[i] ) = G(G( X[i-1] ))
= G(G(...G(X[0])...))
the "i+1" term can be written explicitly as i applications
of the vector function G to the initial data X[0].
- Just as we defined dissipation for flows to be average
contraction of phase space volumes, we can introduce a
similar definition for maps. We consider some small ball
of points in phase space centered on a given point of an
orbit X[i] and take each point in the ball to be an
initial condition for a new orbit. The whole ball then
jumps around in phase space (remaining centered on X[i+1],
then X[i+2] etc.) but deforms because of the different
directions and magnitudes of the vector field G(X).
- The analogous dissipation definition for maps X[i+1] =
G(X[i]) is that the average-over-orbit magnitude of the
determinant of the Jacobian of G(X) has magnitude less
than one:
< | Det[ Jacobian[ G(X[i]) ] ] | > < 1 for dissipative orbits
< | Det[ Jacobian[ G(X[i]) ] ] | > = 1 for conservative orbits
< | Det[ Jacobian[ G(X[i]) ] ] | > > 1 for expansive orbits
where again explicitly:
< A(X) > = (1/N) Sum[ A(X[i]) , {i,1,N} ] .
If the average is larger than 1, the map is unphysical,
although still possibly interesting as a mathematical
problem.
- For maps, the infinitesimal phase space volumes do not
evolve continuously with time so we must have a different
relation between the infinitesimal volume dV[i+1] at
iteration i+1 in terms of dV[i]. I claim that:
dV[i+1] / dV[i] = | Det[ Jacobian[ G(X(i)) ] ] | ,
the ratio of successive infinitesimal volumes is
determined by the magnitude of the determinant of the
Jacobian matrix; this involves the product of the
eigenvalues of the Jacobian, as opposed to its sum as for
flows. Dissipative orbits then again correspond to an
average shrinking of phase space volumes along the orbit.
Note the central role of the Jacobian matrix
Jacobian[G](X) in the determination of dissipation: for
flows, the average sum of the eigenvalues occurs while for
maps, the average product of the eigenvalues is used.
EXAMPLES OF DISSIPATIVE MAPS
- Example 1. Logistic map, x[i+1] = a x[i] (1 - x[i])
is 1d map with:
| Det[ Jacobian[G] ] | = a | (1 - 2 x) | ,
where 0 <= x[i] <= 1 for i, and 0 <= a <= 4 for
orbits to be bounded. To be dissipative, the
parameter a needs to satisfy the inequality:
< | 1 - 2 x[i] | > < 1 / a .
For small a, a < 1, the map is dissipative for all initial
conditions since the left side can not be bigger than 1 if
x[i] lives in the interval [0,1]. For larger a, there are
dissipative and non-dissipative orbits and the criterion
is harder to evaluate except numerically.
The logistic map has two fixed points, x=0 and x=1-1/a. At
the fixed point x=a, the determinant of the Jacobian has
the value "a" and so this fixed point is dissipative
provided a < 1. At the other fixed point, x=1-1/a, the
dissipation criterion becomes "|2-a| < 1" and the fixed
point is dissipative only for 1 < a < 3. So the two fixed
points can't be simultaneously dissipative.
- Example 2. The Henon map lives in a two-dimensional
phase space and is defined by
x[i+1] = y[i] + 1 - a X[i]^2 ,
y[i+1] = b x[i] ,
see Section 12.2 (pages 429-434) of Strogatz. The
Jacobian matrix is
Jacobian[G] ={ {- 2 a x[i] , 1}, {b, 0} }
and this has determinant:
Det[ Jacobian[G] ] = - b ,
which has the constant magnitude |b|. If this is
less than one, |b| < 1, the Henon map is
dissipative and contracts phase space volumes.
From:
dV[i] / dV[i-1] = b,
we deduce that phase space volumes contract geometrically
as b^i with successive iterations.
- If |b| = 0, the map is conservative and
phase space volumes are conserved along orbits. If |b| >
1, the map is unphysical and most orbits are
unbounded. Pictures of the iterations of this map are
given in Strogatz, Section 12.2, pages 429-434. One finds
a complex orbit (believed but not known yet rigorously to
be a strange attractor) for the parameter values (a=1.2,
b=0.3) and for most initial conditions.
- Example 3: Four-stage Runge-Kutta numerical algorithm
for integrating flows:
K1 = dt F[ X[t] ]
K2 = dt F[ X[t] + (1/2) K1 ]
K3 = dt F[ X[t] + (1/2) K2 ]
K4 = dt F[ X[t] + K3 ]
Xnew = X[t] + (1/6) ( K1 + 2 K2 + 2 K3 + K4 )
This famous algorithm (invented around 1900) takes an initial
vector X(t) to a new vector Xnew that approximates the
analytical solution X(t+dt) at time t+dt, the value at the next
time step. The weighting of vectors K1, K2, K3, and K4 are
chosen so that the Taylor series of each component of the
vector Xnew matches the Taylor series of the vector
X(t+dt;x(t)) to the highest possible order. The intermediate
function evaluations K1,...,K4 are called "stages" by numerical
analysts. With four stages, one can match Taylor series up to
fourth-order but not better, so that the local truncation error
behaves as:
|| X(t+dt) - Xnew || = O(dt^5) ,
which goes to zero with the FIFTH power of the time step. This
is pretty good accuracy, e.g., if all quantities are of O(1)
and dt=0.1, then the numerical error is of order dt^5 = 10^-5.
Note: The coefficients of the vectors K1, K2, K3, and K4 are
derived from a set of nonlinear equations whose number is
larger than the number of coefficients and so there is an
infinity of ways to choose these coefficients to achieve
O(dt^5) accuracy. This implies that there is much flexibility
in how to select a Runge-Kutta method, e.g., that satisfies
some extra constraint such as energy conservation.
- You can see that the 4-state Runge-Kutta algorithm for
integrating a set of odes is a complicated but
well-defined map,
X[i+1] = G( X[i] ) ,
where we label Xnew = X[i+1] and X(t) = X[i]. So numerical
integration of a flow dX/dt=F(X) is accomplished by
evaluating some cleverly chosen map. We can then ask
whether this map is conservative or dissipative for the
harmonic oscillator, a conservative flow as shown above
for the damped harmonic oscillator with d=0. Here is the
calculation in Mathematica:
f[ x_List ] := { x[[2]], - x[[1]] } (* oscillator vector field *)
x0 := {x1, x2 } (* initial data *)
k1 := dt f[ x0 ] (* first stage *)
k2 := dt f[ x0 + (1/2) k1 ] (* second stage *)
k3 := dt f[ x0 + (1/2) k2 ] (* third stage *)
k4 := dt f[ x0 + k3 ] (* fourth stage *)
xnew := Expand[ x0 + (1/6) ( k1 + 2 k2 + 2 k3 + k4 ) ]
jacobian = Transpose[ { D[xnew, x1], D[xnew, x2] } ]
Det[ jacobian ]
which gives the following simple result when the dust
clears:
Det = 1 - (1/72) dt^6 + (1/576) dt^8 .
The map is dissipative if the determinant has magnitude
less than one, conservative if equal to one, and expanding
if magnitude greater than one.
- For small time steps, the negative dt^6 term dominates
the smaller positive dt^8 term and the Runge-Kutta map is
weakly dissipative, even though the flow it is trying to
integrate is conservative. This is bad news: the numerical
approximation effectively introduces a small but finite
numerical friction and the numerical integration
corresponds to a weakly damped oscillator that slowly
loses energy.
This explains your earlier homework problem, that the
global error grows slowly but eventually stops growing
when the numerical oscillator has decayed to zero. The
global error does not decay for the driven damped
oscillator of the homework, x'' + 4 x' + x = 4 Cos[t]
since the numerical dissipation of the Runge Kutta
algorithm is very small to the already existing
dissipation of the flow, and the driving of the right side
force continuously overcomes the dissipation.
- For the harmonic oscillator, there is a special time step
dt=2.82843 (a LARGE value comparable to the period of the
oscillator), the map is conservative while for dt >
2.82843 the map will increase phase space volumes and the
iterations should diverge. So we get a specific prediction
for the largest time step with which divergence is
avoided. You might be tempted to choose dt=2.82843 to
make the algorithm conservative, but this time step is so
large that one ends up with no numerical accuracy (the
Taylor series does not converge).
- The fact that the numerical integration algorithms become
conservative for magical time steps is well known and
important to scientists who study satellites and planetary
motion. These scientists often study numerical energy
conservation as a function of the time step, and find
numerous values for the time step for which dE/dt is
approximately zero. They then do their numerical
integration at one of these special values. See papers of
Jack Wisdom and collaborators of MIT, e.g., : `Chaotic
Evolution of the Solar System,'' Gerald Sussman and Jack
Wisdom, Science, 257, 3 July 1992.
- People have discovered numerical method for integrating
flows of interacting particles that not only conserve
energy exactly, but also momentum and angular momentum
(which should also be preserved for a closed system of
particles).
DERIVATION OF PHASE SPACE VOLUME CHANGES: MAPS
- Consider a map X[i+1] = G( X[i] ), where the
vector field G(X) is smooth. Consider an infinitesimal
parallelepiped in N-dimensional phase space defined by N
tiny vectors V1,...,VN with base at X[i]. The
infinitesimal volume dV[i] enclosed by the simplex of
points X, X+V1, ..., X+VN is then given by the determinant
of the matrix whose columns are the vectors:
dV[i] = Det[ {V1, ..., VN} ] ,
where the subscript i denotes that this is at the ith
iteration of the map G(X). I will let you figure out how
to prove this result by induction but hopefully you have
seen this before.
CHALLENGE: Assume that you have K < N N-dimensional
vectors in a N-dimensional phase space. The tips of
these K vectors together with the origin define a
K-dimensional simplex of K+1 points. Prove that the
volume of this simplex is given by:
volume_K = Sqrt[ Det[ M^T M ] ] ,
where M = {V1,...,VK} is a NxK rectangular matrix whose
columns are the given vectors, M^T denotes the KxN
matrix transpose of M, and where M^T M denotes the
matrix product of M^T with M and so is a KxK
matrix. Neat!
- Now under the map G(X), X[i] goes to X[i+1] and
X[i] + V1 goes to
G(X[i] + V1) approx G(X[i]) + Jacobian[G](X[i]) . V1 ,
where I have used the vector form of Taylor's theorem to
lowest (linear) order in the small quantity V1. We see
that each vector Vi is simply stretched and twisted by the
Jacobian matrix to lowest order in a Taylor series. We
have used the assumption that the vectors Vi are small to
invoke the Taylor expansion.
- The volume of the new infinitesimal parallelepiped
is given by the same formula:
dV[i+1] = Det[ {M.V1, ..., M.VN} ] ,
where I have use the matrix M to abbreviate the notation
for the Jacobian matrix, Jacobian[G[X[i]]]. But by
elementary linear algebra gives
{M.V1, ..., M.VN} = M . {V1, ..., VN} ,
the matrix with column vectors M.Vi is the same as the
matrix product of M with the matrix {V1,...,VN}. And
since the determinant of the product of matrices is the
product of determinants, we see:
dV[i+1] = Det[M] dV[i]
or
|dV[i+1] / dV[i]| = | Det[ Jacobian[G[X[i]]] ] | .
Quod erat demonstrandrum.
CONCEPTS OF ATTRACTORS, INVARIANT SETS, BASINS OF ATTRACTION:
- We now come to a key insight for the entire course:
dissipation is a crucial concept in the mathematical
theory of nonequilibrium systems because it greatly
reduces the effective size of the space of initial
conditions X0, in solutions X(t;X0;P):
DISSIPATIVE SYSTEMS HAVE TRANSIENT BEHAVIOR THAT
DECAYS, LEADING TO A STATISTICALLY STATIONARY STATE
WHOSE STATISTICAL PROPERTIES ARE INDEPENDENT OF MOST
INITIAL CONDITIONS.
The transient behavior is an immediate consequence of
volume contraction: any small cluster of initial
conditions define a volume that shrinks to zero volume
along an orbit. The transient is over when the volume is
effectively zero (or below any observable limit).
Thus the long-time properties of dissipative systems are
weakly dependent on the choice of initial condition. This
is an enormous and useful mathematical simplification
compared to conservative systems, for which there are no
transients and for which small changes in the initial
conditions can lead to statistically different solutions.
- Note the crucial point that it is the STATISTICAL
properties, not the detailed properties, that are
independent of the initial condition. Thus all initial
conditions except x1=x2=0 lead to a limit cycle in the Van
der Pol oscillator x'' + (x^2 - e) x' + x = 0,, so the
final statistical property is periodic (a limit
cycle). The period, amplitude, power spectrum, etc. are
all the same. But the PHASE of the limit cycle, when you
fall onto this attractor, is dependent on the initial
condition, the final solutions are not identical.
More precisely, the asymptotic set of points traced out by
the orbit X(t) in the limit of infinite time is the same.
- With this definition of dissipative systems, dissipative
dynamical systems have ATTRACTORS in bounded regions of
phase space. I.e., initial conditions get "pulled" or
"sucked" into certain regions of phase space of zero
dimensionality and settle on some set of points in phase
space that is INVARIANT under the equation.
An invariant set is one in which any initial condition
in that set leads to orbits that remain in that set,
i.e., it remains unchanged under the dynamics.
- The set of all points in phase space that are attracted to
a given attractor is called the BASIN OF ATTRACTION. This
is an invariant set since any point in the basin moves
onto another point that is also in the basin since they
all end up following into the attractor.
Basins of attraction often have infinite volume in phase
space, even when the resulting attractor has zero volume.
For some simple systems, e.g., damped harmonic oscillator
or Van der Pol equation, all of space except for a few
points (e.g., any fixed points) constitutes the basin of
attraction.
More generally, dynamical systems typically have many
attractors in different parts of phase space and the phase
space is broken up into different basins that lead to a
given attractor. This intricate structure is often quite
important for the understanding of some given physical or
engineering system.
EXAMPLES OF INVARIANT SETS
- Any fixed point is a dynamical invariant since if you
start on this set, you stay on this set for all time.
- Any limit cycle (isolated periodic orbit or loop in phase
space) is an invariant set. If you choose any initial
condition on the limit cycle, you can't get off it by the
periodicity.
- Consider the 3d flow of the recent homework assignment for
three interacting bird species, which were of the form:
dx/dt = x f1(x,y,z) ,
dy/dt = y f2(x,y,z) ,
dz/dt = z f3(x,y,z) .
But then the three planes x=0, y=0, and z=0 are invariant
sets such that any initial condition lying in these sets
generates an orbit under this flow that never escapes
these planes. This is a simple consequence of the
existence and uniqueness theorem: the vector field F(X) is
smooth everywhere and if z=0 for some initial condition
(x0,y0,0), then z stays zero forever by the third
equation. Thus through each initial condition in the z=0
plane there is an orbit that lives within that plane.
Note: These invariant planes are NOT attractors since
initial conditions outside of these planes are not pulled
into these planes in the limit of infinite time.
The fact that these planes are invariant answer one of the
recent homework questions: an initial condition
(n1(t),n2(t),n3(t)) in the first positive octant of the
phase space ( ni(t) > 0 for i=1,2,3) can never get out of
this octant. This follows since the invariant planes are
already filled with orbits that lie in those planes
forever. Thus any orbit outside of those planes would have
to intersect some other orbit in finite time to cross
which is not allowed by the uniqueness theorem.
You now see how this idea generalizes: if you have ANY
invariant set S that forms a (N-1)-dimensional surface in
a N-dimensional phase space, then it is impossible for
orbits outside of that surface to cross that surface.
- Mathematicians often add a further restriction to the
assumption that an attractor is an invariant set. The
also assume that the set is TOPOLOGICALLY TRANSITIVE
(ergodic in the language of physicists). This means that
there is at least one orbit on the attractor that comes
arbitrarily close to all other points on the
attractor. This property is obvious for fixed points,
limit cycles, and N-tori, but rather subtle and non-obvious
for strange attractors.
There are many other mathematical subtleties about how to
define these concepts but it turns out that the technical
details and subtleties rarely seem to be important on a
day-to-day basis for scientists and engineers working on
nonlinear problems.
EXAMPLES OF ATTRACTORS:
- If you type any number into a pocket calculator and hit
the COS key repeatedly (make sure that you are using
radians, not degrees!), you always end up with the same
number, approximately 0.73908513.... Mathematically,
hitting the COS key repeatedly is the same as iterating
the map
x[i+1] = Cos[ x[i] ] .
I will leave it to you as a moderate challenge to show
that this fixed point x*=0.73908513* is a GLOBAL
attractor: all values x0 on the real line lead to a
sequence that converges to x*. The basin of attraction is
then the entire real line and the only attractor is x*.
- For the logistic flow, x' = x (1 - x), there are two
fixed points that are invariant under the flow. The point
x=1 is an attractor since there is a basin of attraction
containing many points, which lead to orbits that
terminate at x=1. The basin of attraction consists of all
points x > 1 and all points x > 0 and x < 1. Points with x
< 0 lead to orbits that diverge to minus infinity.
- For the logistic map x[i+1] = 4 x[i] (1 - x[i]),
most orbits are topologically transitive in the interval
[0,1], i.e., they come arbitrarily close to all points in
[0,1]. Strictly speaking, [0,1] is NOT an attractor since
it is not contained in a superset of points that
constitute its basin of attraction. This is an example of
a strange but not fractal attractor. ("Strange" means that
there is exponential magnification of perturbations to
initial data, i.e., there is a positive Lyapunov exponent
lambda_1.)
- The Van der Pol oscillator x'' + (x^2 - eps) x' + x = 0
has two invariant sets for eps > 0: the fixed points
x1=x2=0 at the origin and some limit cycle C that encloses
the origin. C is an attractor with a basin of attraction
consisting of all points inside C except the fixed point
at the origin and of all points outside of C.
- The Lorenz equations have a "strange" attractor for
r >approx 25. Many points end up on the same geometric
set, which has zero volume but infinite area, and is
sensitive to the choice of initial conditions. The basin
of attraction is somewhat complex, but has infinite volume
in the 3d phase space. There are three fixed points which
are invariant but are not attracting for the classical
value r=28.
EXAMPLE OF ATTRACTORS: CHAOS AND FLIPPING A COIN
- A natural question that might have occurred to some of you
is whether the randomness of getting heads or tails from
flipping a coin has anything to do with chaos. The answer
is NO! A simple model for flipping a coin is that you give
it an initial torque (flicking it with your thumbnail) to
get it spinning and then send it up into the air. The
center of mass of the coin then follows a parabolic path
while the coin spins at constant uniform angular frequency
about its axis. If you flip the coin into a pail of sand
(for simplicity), then whether you get heads or tails just
depends on the orientation of the coin when it first hits
the sand.
The equation of motion of the spinning falling coin is a
LINEAR ode:
d(theta)/dt = omega ,
where omega is the constant angular rotation determined by
how hard you flicked it. I.e., you can think of the coin
as simply spinning at constant angular rotation from the
time it leaves your hand until it hits the floor. The
angle simple grows linearly with time:
theta = omega t + theta0 ,
and whether you get heads or tail simply depends on the
initial angle theta0 (zero if you hold the coin flat) and
the initial strength of the flick omega0 (which is very
hard to control and is where all the randomness comes
from).
In fact, flipping a coin is a conservative dynamical
system to an excellent approximation (except for the final
collision with the floor when the coin comes to a halt)
and so there can not be any attractors. The randomness
then arises from the fact that there are many tiny closely
space sets of initial conditions that lead to heads or
tail. There is no exponential stretching of nearby initial
points in phase space and so chaos is irrelevant here.
For a detailed and interesting discussion of this point,
you might like to read the following paper:
@article{Keller86,
author = {Joseph B. Keller},
title = {The probability of heads},
journal = {Am. Math. Month},
volume = {3},
pages = {191--197},
month = {March},
year = {1986}
}
NEWTON'S METHOD FOR ROOTS AS A DYNAMICAL MAP
- Isaac Newton's famous and important method for finding
roots of functions of a single variable f(x) is given by
the following iterative map,
x[i+1] = x[i] - f(x[i]) / f'(x[i])
= g( x[i] ) ,
with g(x) = x - f(x)/f'(x). This gives a dissipative map
that converges to an attractor (root of a function f(x))
for many but not all initial conditions.
This generalizes to N-dimensional systems of equations
F(X)=0 as follows:
X[i+1] = X[i] - Inverse[Jacobian[F](X[i])]. F(X[i]) ,
which involves solving a NxN set of linear equations
associated with the NxN Jacobian matrix of F(X). With
your expert knowledge of multivariate Taylor series,
you can derive this N-dimensional case in a
flash. Assume that you have an approximate initial
vector X[i] that approximates a solution of the vector
equations G(X)=0, and assume that a tiny correction
vector H, when added to the initial guess X[i], will
give you exactly zero:
G(X[i] + H) = 0 , || H || << 1 .
Taylor expanding to lowest order in the small quantity
H around the point X[i] gives:
G(X[i]) + Jacobian[G](X[i]) . H = 0 ,
which is precisely the Newton iteration formula: you
solve for the small vector H and then add H to X[i] to
get a better approximation.
You can easily see that any fixed point of the Newton map
is indeed a zero of f(x) provided that f'(x) is not also
zero at the same time, i.e., x* is a simple root. Indeed,
assuming x[i]=x* is a fixed point we have:
x* = x* - f(x*) / f'(x*) , f'(x*) != 0
which implies f(x*)=0.
You can also see that the Newton map g(x) = x - f/f' is
strongly dissipative near a root for any function f(x)
since its Jacobian matrix g'(x)
g'(x) = - f(x) f''(x) / (f'(x))^2 ,
evaluated at a simple zero of f(x) gives the value zero
and so definitely has magnitude less than one. The
vanishing of the Jacobian matrix at a fixed point is a
rather rare and special occurrence for dissipative maps
and is called SUPERCONVERGENCE. This vanishing explains a
remarkable property of Newton's method: when it converges,
it converges with lightning speed; each successive
iteration approximately doubles the number of significant
digits in the answer!
CHALLENGE: Check this rapid quadratic convergence by
iterating for the quadratic equation x^2 - 2 = 0 for
Sqrt[2].
- We can now put on our nonlinear dynamical hats and think
of Newton's method as a dynamical map. The roots are then
attractors and it is then quite interesting to figure out
what the basins of attraction are: what initial conditions
flow to a given zero of F(X)=0 under the Newton iteration?
The bad news is that typically the basins of attraction
for the roots of a Newton method (and in fact for nearly
any nonlinear dynamical system) can be QUITE complex and
therefore nearly impossible to characterize. The good news
is that there is much unexpected geometric beauty
associated with the basins of attraction.
Let's try an explicit example by trying to find the three
cube roots of unity satisfying:
z^3 - 1 = 0 ,
by Newton's method. The three roots are:
z1 = 1 ;
z2 = Exp[2Pi I / 3] = Cos[2Pi/3] + I Sin[2Pi/3]
= - (1/2) + I Sqrt[3]/2 ;
z3 = Exp[4Pi I / 3] = Cos[4Pi/3] + I Sin[4Pi/3]
= - (1/2) - I Sqrt[3]/2 ;
which live at 120 degrees w.r.t. each other in the complex
z plane. We can try to locate these roots by using
Newton's method as follows (which generalizes
straightforwardly for complex-valued maps):
z[n+1] = z[n] - ( z[n]^3 - 1 ) / (3 z[n]^2) ,
The initial condition z[1] must now be complex-valued to
converge to any root other than z1=1.
By symmetry and common sense, one would expect that the
set of all initial values z0 that are closer to some given
root z* than some other root will be in the basin of
attraction of z*. If true, then the basins of attraction
would be three symmetric wedge-shaped regions bounded by
straight lines. In fact, the basins of attraction for each
of the three roots are complex intricately woven objects
of stunning complexity, they are fractals. These
boundaries have the bewildering WADA PROPERTY (named after
a Japanese mathematician who first pointed out this
possibility), each point on the boundary of the basin of
attraction is also on the boundary of the other two basins
of attraction, a difficult situation to imagine indeed but
rather common in nonlinear dynamics.
Note: I can't show the figures easily in my notes but I
show two slides of basins of attraction of roots in
class. There is also a color picture in Gleick's Chaos
book for the basins of the fourth roots of unity.
ATTRACTORS ARE ZERO VOLUME SETS:
- Attractors have zero N-dimensional volume in N-dimensional
phase space. This is simple: enclose the bounded attractor
in some finite union of N-dimensional boxes. Then follow
where all the points in these boxes go. Since the flow is
dissipative, the boxes shrink to zero volume, and yet
these boxes contain the attractor. Thus:
D < N ,
where D is a dimension of the attractor and N is the phase
space dimension.
- The decreased dimensionality means generally that
it takes FEWER numbers than N to describe dynamics on the
attractor, the state vector is effectively reduced in
complexity!. Thus a fixed point has no degrees of freedom,
no matter what size the phase space it lives in. A limit
cycle has one-degree of freedom, one can parameterize the
cycle by some angle that is 0 at the beginning and varies
smoothly to 1 once around. Thus any limit cycle is
effectively one dimensional since a single parameter can
characterize all its points, e.g., arc length. Similarly,
a torus is two-dimensional since two parameters (poloidal
and toroidal angles) specify all its points. A bit more
complicated for strange attractors, as we shall soon see.
- Decreasing volume does NOT mean attractors are
necessarily points or loops of zero volume. Volumes can
shrink to zero in many ways, including expansion rapidly
in some directions with stronger contractions in
others. We will see that this leads typically to fractal
objects, of dimension intermediate between volumes and
areas. Volumes in phase space are stretched, twisted, and
folded over and over again (boundedness!), leading to
strange attractors.
BASIN BOUNDARIES AND STABLE AND UNSTABLE MANIFOLDS
HOMOCLINIC TANGLES
- Consider unstable periodic orbit that is unstable fixed
point of some Poincare map G(X). Stable and unstable manifolds
W^s and W^u respectively intersect in homoclinic point of which
there must then be an infinity of such points if there is a
single intersection. Further, the manifolds must wind back and
forth furiously to pass through successive homoclinic points as
the points converge towards the fixed point in forward and
backward time.
CHALLENGE: Show that stable manifolds can not intersect
themselves, nor can unstable manifolds intersect
themselves. Show that the stable manifold of one fixed point
can not intersect the stable manifold associated with a
different fixed point but can intersect the other fixed point's
unstable manifold.
- You can find a brief discussion of homoclinic tangles in
Section 4.3 of Ott's book "Chaos in Dynamical Systems" and a
more mathematical and detailed discussion in the advanced book
"Nonlinear Oscillations, dynamical systems, and bifurcations of
vector fields" by J. Guckenheimer and P. Holmes.
REVIEW OF POINTS SO FAR
- Dissipation of orbits greatly simplifies analysis of
dynamical systems by decreasing greatly the importance of
initial conditions.
- A set A in the phase space of a dynamical system X'=F(X)
or X[i+1]=G(X[i])) (for a flow and map respectively) is
called an ATTRACTOR if:
- There are orbits X(t;X0) outside of the set A which
end up in the set A in the limit of infinite time, t ->
Infinity.
- The set A is INVARIANT meaning that it is unchanged by
evolving the points of the set as initial
conditions. More casually, any point in A leads to an
orbit that stays in A for all time.
- The set A is minimal, i.e., no strict subset of A has
the above properties.
- The set of all initial conditions X0 that produce an orbit
X(t;X0) that end up on a particular attractor A in the
limit of infinite time is called the BASIN OF ATTRACTION
of the attractor.
- An attractor is called chaotic if there is average
exponential separation of orbits starting from initial
conditions on the attractor that are infinitesimally close
to one another.
Note: An attractor is called "strange" if the attractor is
fractal with a non-integer dimension D. A strange
attractor is NOT necessarily chaotic and a chaotic
attractor is not necessarily fractal. E.g., the logistic
map on the phase space [0,1],
x[i+1] = r x[i] (1 - x[i]) ,
has a strange but non-chaotic attractor at the special
value r* that is the accumulation point of all the
period-doubling bifurcations. For the value r=4, the map
is chaotic with Lyapunov exponent lambda_1=Log[2] > 0 but
the chaotic orbits fill the interval [0,1] densely and so
there is no attractor (or the attractor, as the unit
interval, is not strange).
- Attractors have zero volume in the N-dimensional phase
space of the dynamical system (average contraction of
phase-space volumes).
- Attractors and basin boundaries are ofter complicated wrinkled
fractal sets. In particular, it is essentially impossible to
characterize the set of initial conditions that will approach a
given attractor, e.g., because of the Wada property.
- Discussed briefly concepts of stable and unstable
manifolds, denoted by W^s and W^u respectively, for a
fixed point. Something truly extraordinary happens if W^s
intersects W^u at a point for a map: there is an infinity
of intersections and the stable and unstable manifolds
must wind back and forth in an extremely complex way. The
resulting mess is called a homoclinic tangle and helps to
explain how complex chaotic dynamics can arise from smooth
innocent-looking flows.
NEURAL NETWORKS: CONCEPTUAL APPLICATION OF ATTRACTORS TO NEUROBIOLOGY
- The concepts of attractors and basins of attraction
are already sufficient to lead to many interesting new
insights. The power of these concepts is already enough to
justify the study of nonlinear dynamics, whether or not
exotic subjects like chaos and fractals prove important to
you.
- A growing and important application of nonlinear dynamics is
understanding how the brain works, how 10^10 neurons with 10^13
interconnections for the human brain leads to its remarkable
information processing and its robotic capabilities (e.g.,
drawing a picture, playing a musical instrument). One
traditional approach for understanding brain has been
"artificial intelligence" (or "artificial stupidity" by its
cynics), which guesses that insights obtained from the theory
and design of digital computers will help to explain and to
design biological-like brains. A strong motivation for this
approach is the Turing-Church hypothesis that all digital
computers (i.e., computers that manipulate a finite number of
bits with each operation but can have an infinite amount of
memory) are equivalent and so the human brain, as a complex
device that is still a computer, must be understandable by the
same rigorous logic as digital computers.
It is a deep and somewhat undefined problem whether
physical systems like a human brain actually are
equivalent to Turing machines. Certainly, our pattern
recognition ability and high-level cognitive abilities far
outshine any existing computers and any computer being
currently designed (10 teraflops or so).
- A different approach to brain design is biological and
Darwinian: rather than worry about logical subtleties of
Turing machines and computability, we ask how is it
possible that evolution would lead to a brain-like
structure out of cellular components? That the brain arose
from evolution would imply that there were numerous little
steps of incremental competitiveness and so the giant
human brain evolved out of simple components without any
particular guiding principles or deep mathematical
principles.
- In particular, a biological theory of brain has to take
several facts into account:
- Brains are made up of unreliable components. Neurons die
at a high rate (hundreds of thousands per day for people
over the age of 20 or so), are strongly influenced by
drugs like alcohol, and yet the brain continues to
function effectively. Large amounts of neural tissue can
be excised and yet brain function such as memory or
pattern recognition retained.
- The brain is massively parallel: many computations of
the brain, e.g., fright or flight, are carried out in a
500 msecs while the fastest neurons operate only about
100 times that fast. Given the slow speed of neuronal
communication, this implies that each neuron gets to
talk to only about 100 other neurons by the time a
thought has been carried out. This is possible only if
massively parallel computations are carried out.
- The brain (and biological organisms in general) have to
deal with a NOISY environment: one never has clean
well-defined objects, e.g., a predator may have
camouflage and move stealthily and it may be raining out
and it may be near dusk when there is not much light and
so discriminating between the background and a predator
is difficult.
CONTENT-ADDRESSABLE MEMORY
- One of the most interesting, and fairly recent conceptual
applications of nonlinear dynamics is the HOPFIELD MODEL
OF ASSOCIATIVE MEMORY. Psychologists and computer
scientists define associative memory (also called
CONTENT-ADDRESSABLE MEMORY) as information that is stored
by content, rather than by an index.
In simple computer codes, one typically stores numbers or
strings in arrays, and accesses this information by
looking up an array index, e.g., a[3] = "Mary's lamb".
Given the biological structure of brains, it seems rather
unlikely that animals store information in this manner
since an index like 3 has nothing to do with what is
stored and accessing a particular memory would then be
difficult. Instead, people have speculated that organisms
store memory by content, so given partial information
(e.g., the word "lamb"), the entire memory can be
recovered. This is clearly more useful from an
evolutionary point of view, since external stimuli can
trigger the appropriate response.
- People (presumably all mammals) have superb associative
memories, given a smell, a sound, a touch, an odor, part
of a word, part of a sentence, part of a figure, five
notes from a song, etc., entire memories can be triggered
in great detail. It is not yet known how this is
accomplished in terms of actual neurophysiology.
John Hopfield (who was at Bell labs in New Jersey in 1983 when
he wrote his famous paper and who is presently at Cal Tech) had
the insight that associative memory might be a nonlinear
dynamical effect that could spontaneously arise by evolutionary
selection from interconnected nonlinear neurons. Associations
were simply different initial conditions in the same basin of
attraction, and the memory (usually a fixed point) was the
attractor towards which all points in the basin
flowed. Hopfield rightfully became quite famous by showing how
to CONSTRUCT nonlinear models whose attractors were given
memories, and by being able to prove several important theorems
about his model. His paper proved to be enormously stimulating
and has led to thousands of other research papers in many
different disciplines such as engineering, robotics, computer
science, psychology, and physics. There is still much active
research about Hopfield models to this day, as you can attest
for yourself if you browse through journals in the Vesics
Engineering Library or browse the world wide web.
- Hopfield constructed his model from N mathematical neurons
that were highly abstracted from actual biological
neurons. Each "neuron" was a simple device whose state was
discrete, with just two possible values -1 or 1. The ith
neuron was connected to all the N-1 other neurons j with
N-1 wires each of which were weighted by some numerical
connection strength t[i,j] (so there are "N take 2" or
N(N-1)/2 wires in all for a Hopfield net of N neurons.)
Hopfield called these connection strengths the t[i,j]
matrix and assumed unrealistically but insightfully and
productively that this matrix was symmetric:
t[i,j] = t[j,i] ,
that neuron i talked to neuron j with the same strength as
neuron j talked to neuron i.
In actual neurological tissue, a neuron is a complex
electrical-chemical cellular blob with many dendrites
("wires") feeding into the main body and a single axon
coming out. The connections of the dendrites to the
neuronal body are themselves quite complex. When enough
excitatory signals come in, the axon fires, sending action
potentials of fixed magnitude but variable frequency down
the axon to other neurons. (The outgoing single axon
eventually ramifies and touches upon many other neurons.)
A single neuron is itself very complicated, so Hopfield's
model is a great simplification.
This simplification is the essence of good modeling and good
theoretical insight and is an important lesson for those of you
who are budding theorists: the best progress, in the sense of
the deepest insights, often come from simplifications of a
physical system, rarely from direct frontal attacks on the full
equations using all available details. We saw this previously
throughout the course, e.g., Lorenz's discovery of his three
equations as a caricature of the complicated dynamics of the
five three-dimensional nonlinear partial differential equations
(the so-called Boussinesq equations) that experiments show to
give a quantitatively accurate description over many ranges of
parameter values.
- A memory then was defined by some fixed point of the
dynamics, i.e., assignments of -1 or 1 to each neuron. The
associations of that memory are all points in the basin of
attraction. There must definitely be associations that
give spurious memories, the points on boundaries between
basins of attraction, but these are of measure zero in
phase space.
- The dynamics of the model is remarkably simple. At
a random time, each neuron, say the ith neuron, "wakes
up", and forms a weighted sum of the states of all other
neurons except itself:
h[i] = Sum[ t[i,j] s[j] , {j != i , 1 , N} ] .
The neuron then changes its internal state according to
the simple rule:
If h[i] > 0 change state i to 1
If h[i] <= 0 change state i to -1
Using the language of neurobiology. this rule implies that
positive matrix elements t[i,j] > 0 are "excitatory" and
encourage a change to an active state (value 1), while
negative matrix elements are "inhibitory" and push the
cell towards being inactive (value -1).
This random waking up and possibly changing a state
repeats over and over again until some fixed point is
approximately reached. Thus this generates an orbit in a
discrete N-dimensional phase space where N is the total
number of bits used to store any memory in the net. Unlike
other phase spaces we have discussed previously in the
course, this phase space has a finite number of states,
namely 2^N patterns since each neuron has a 1 or -1 state.
- This algorithm is ASYNCHRONOUS and PARALLEL, just as
biological networks of neurons seem to be. Furthermore,
the states are NON-LOCAL. Partial destruction of neurons
or links has a modest affect on memory. This nonlocality
has been shown in numerous experiments, e.g., Lashley's
famous experiments of the 1940's, in which mice could
still run mazes correctly with substantial portions of
their brains removed. Nonlocality is not universal in the
brain, there are special centers for speech, breathing,
etc., for which local damage removes the associated
capability. Anatomists have identified over 200
specialized regions of brain tissues, these are
conjectured to be fairly independent processing systems.
- Hopfield was able to show that this model had to settle
down to some finite number of fixed points as long as
t[i,j] was a symmetric matrix. The proof is rather simple,
based on the idea that if a physical dynamical system is
dissipative, one can sometimes find an energy-like
quantity E[S], A LYAPUNOV FUNCTIONAL, that decreases along
each orbit. If this quantity is bounded from below, all
states must eventually reach a stationary state when the
minimum value is attained.
Thus for the damped harmonic oscillator m x'' + a x' +
m w^2 x = 0, there is the total energy:
E = (1/2) m (x')^2 + (1/2) m w^2 x^2 ,
and one easily verifies that E[x',x] decreases
monotonically along any orbit except a fixed point:
dE/dt = m x' x'' + m w^2 x x'
= x' ( m x'' + m w^2 x )
= - a (x')^2 with a > 0.
But E is bounded from below by zero, (x')^2 and x^2
can't become smaller than 0, so all dynamics tend
towards the fixed point x1=x2=0.
This trick works in more sophisticated contexts. Thus the
more complicated two-dimensional pde, the Swift-Hohenberg
model:
Dt[ psi ] = ( eps - (DEL + 1)^2 ) psi - psi^3
has an associated Lyapunov functional:
E[ psi ] = Integrate[ dx dy
- (1/2) eps psi^2 + (1/4) psi^4 + (1/2) [ (grad + 1) psi ]^2
]
and using integration by parts, one can show that:
dE/dt = - Integrate[ dx dy ( D(psi)/dt )^2 ] .
Thus the right side is always negative until a minimum is
reached, at which point psi becomes stationary in
time. This proves the remarkable result that
Swift-Hohenberg can only have complicated transients
leading to stationary states. Again, Swift and Hohenberg
did this backward, they CONSTRUCTED a model to have an
associated Lyapunov functional, to aid their physical
theory.
- Similarly, Hopfield constructed his model to have a
Lyapunov function . Thus he defined the quantity:
E = - (1/2) Sum[ t[i,j] s[i] s[j] , {i,1,N}, {j,1,N} ]
Although reasonable in hindsight, Hopfield was led to this
expression by his background as a condensed matter
theorist. This expression corresponds to the energy
(Hamiltonian) of a physical material called a "spin
glass", which is a magnetically disordered metal, e.g.,
dilute concentration of iron atoms randomly dissolved in a
copper matrix. Spin glasses form strange physical states
that are intermediate between ferromagnets and ordinary
metals, the magnetic spins are locked in position but
randomly positioned in space, and there are strange phase
transitions as temperature or external magnetic fields are
varied. Spin glass physics has been an important branch of
theoretical physics in the last 25 years, with
implications for mathematical optimization, computer
science, and biology.
- The change in energy when the ith neuron wakes up and
changes its state is given by:
dE = - d(s[i]) Sum[ t[i,j] s[j] , {j != i} ] ,
where d(s[i]) is the change in the state of the ith
neuron. But by construction of the algorithm, if the sum
is positive, the change in state d(s[i]) is either zero (1
to 1) or positive (-1 to 1), so the energy must stay the
same or decrease. After some finite time, E must reaches
its minimal value, implying a fixed point. The symmetry of
the matrix t[i,j] is essential for this argument.
Hopfield showed empirically in his paper that small
asymmetric perturbations of a symmetric matrix did not
change his conclusions of convergence towards a fixed
point.
- Another important contribution of Hopfield was to show how
to construct the connection matrix t[i,j] explicitly so
that desired memories or patterns occur at the fixed
points. If we denote by V^s_i the desired n patterns,
s=1,...,n, representing bits -1 and 1 at each site
i=1,...,N of the N neurons.
t[i,j] = Sum[ V^s_i V^s_j , {s, 1, n} ] for i != j
t[i,i] = 0 for i = j
You can think of these bits as pixels (black or white) on
a screen, or as some quantization of data into certain
kinds of objects.
- In order to work, Hopfield's memories V^s_i have to be
approximately ORTHOGONAL, another non-biological
requirement. To see this, we can ask whether the above
matrix and algorithm do indeed preserve any of the given
states V^s_i if one starts in that state. Thus assume that
the entire network is in the mth memory, V^m, and assume
that the ith neuron wakes up to form its input sum and
evaluation. Then this neuron constructs the sum:
h[i] = Sum[ t[i,j] V^m_j , {j, 1, N, i != j} ]
= Sum[ V^s_i Sum[ V^s_j V^m_j , {j,1,N} ] , {s, 1, n} ]
If the bit vectors V^s are approximately orthogonal, their
bits are roughly randomly chosen to be -1 and 1, we can
approximate:
Sum[ V^s_j V^m_j , {j,1,N} ] approx N delta[s,m]
Substituting this, we see that
h[i] approx V^m_i
and the state V^m is approximately a fixed point of the
dynamics.
- The requirement that the states be roughly orthogonal
is not too bad for a few states, but as one tries to store
more and more memories, new memories are less and less
orthogonal to existing memories and behave less and less
well as fixed points. There are technical ways to
eliminate orthogonality all together as a requirement (see
various books by Kohonen, a Finnish theorist who has made
many important contributions to neural networks).
Biologically, one may conjecture that all important
memories are COMPRESSED, i.e., redundant bits are removed
(e.g., 100 1's would be replaced by the number of 1's, a
much smaller amount of storage). Compression tends to
make bits more random and hence the memories more
orthogonal.
CHALLENGE: Try working out a compression scheme and
verify that states of -1's and 1's become more
orthogonal under compression (look at the binary values
of the compressed data as stored on the computer).
REVIEW OF RECENT POINTS:
- Discussed background and context of John Hopfield's neural
network model of a content-addressable (associative)
memory. Some key points:
- Biological brains work are massively parallel,
asynchronous, and work with slow, faulty components.
- Biological organisms need to process information that is
often noisy; precise logical algorithms are not going to
work. As one example, one can not use gradient methods
to optimize some feature of sensor input since noise
prevents local derivatives from being well-defined.
- Hopfield's key insight was to store memories as stable
fixed points and to use the basins of attraction as the
"associations" of the fixed points. He figured out a
clever parallel asynchronous scheme with simple
computing elements that stored patterns (memories) in
the "N Take 2" interconnection strengths t[i,j] of N
neurons whose states could be 1 or -1.
Given P patterns of N pixels (i.e., P Boolean vectors of
length N), Hopfield showed how to calculate the t[i,j]
connection elements so that the given patterns were
stable fixed points of the asynchronous parallel
algorithm.
- Strongly recommend that you try playing with Java or other code
example to get a direct feeling for how Hopfield network
performs. Try storing several letters (e.g., A, B, C, and D) in
a 5x5 network and study how various starting values on the 25
pixels converge or don't converge towards these letters. For
four memories, can you find spurious fixed points that are
stable but shouldn't be there?
STORAGE CAPACITY OF A HOPFIELD NETWORK
- Given N neurons, Hopfield found that his model could only
store a finite number of memories, empirically 0.15N. This
is MUCH smaller than the total number of 2^N possible
states. As the number of memories stored starts to exceed
about 0.15N, the efficiency and accuracy of the associate
memory starts to degrade: it takes longer to reach a given
fixed point and more spurious fixed points appear
(spurious means that the neural network settles into fixed
points other than the intended ones V^s used to construct
the t[i,j] matrix).
- REM sleep: If human memory were based on attractors,
Francis Crick, Hopfield and other researchers conjectured
that, it would be important to forget information to
clarify most memories, and he speculated that this was the
reason for some kinds of dreaming (REM sleep). In the
context of the Hopfield model, "clarify" means making
memories more orthogonal so that convergence is more rapid
and spurious memories reduced.
SCALE INVARIANCE: A WEAKNESS OF THE HOPFIELD MODEL
- Weakness of Hopfield models: in both vision and hearing,
people can extract information from an object that is
greatly transformed from some standard shape. Thus a
pencil may be rotated, translated, and rescaled to some
arbitrary point in space, the ambient light may be bright
or dim, and with different colors, yet people have no
trouble identifying the pencil (unless it is head on to
the viewer, in which case it looks like a dot). Similarly,
people understand language even when spoken loudly or
softly or when badly slurred because of colds or
drugs.
This invariance of understanding under continuous
transformations of rotation, translation, or dilation are
remarkable feats of human and animal senses, and are HARD
to build into a Hopfield or similar mathematical
networks. Thus a Hopfield network, if exposed to a single
image of a pencil (say a line segment), may converge to
the wrong memory if the line segment is rotated 45 degrees
or shifted over by a few pixels.
- Scientists conjecture that animals use sophisticated
specialized PREPROCESSORS to map an input into some
canonical input and THIS processed input is what is then
fed into an associative neural network. Hubel and Wiesel,
two famous experimental neurophysiologists, showed that
some neurons in the brain often perform edge or motion
detection and nothing else, which are certainly some of
the properties that a preprocessor would have.
SCALABILITY: HOW TO ACHIEVE IT?
- Another difficulty of Hopfield networks, and in fact of
all current approaches to AI, either logical or dynamical,
is how to scale from plausible models of a few million
neurons and a few thousand memories to human-levels of
computation which involve 10^10 neurons and hundreds of
megabytes of sound, sight, touch, and taste. Presently, no
one knows how to scale up a Hopfield network to perform
with that much data.
HOW MUCH MEMORY IS STORED IN AN ADULT HUMAN BRAIN? 200 MB?
- Interjection: How much information do people actually
have in their head, i.e., how many memories N might be
stored in the human brain if we considered the brain to be
a giant Hopfield network with 10^10 neurons and 10^13
interconnections? Using several different methods, the
best guesses by psychologists approximately agree and give
of order 10^9 bits, 1 gigabit or a few 100 megabytes. This
is a pretty large amount but still less than the 10^10
neurons in your head and the 10^14 connections between
neurons. This order of magnitude estimate is roughly
compatible with Hopfield's estimate of 0.15N, although no
one presently believes that Hopfield is even close to
being biologically correct.
- THOUGHT QUESTION: Does biology use dynamics along the
lines of Hopfield, or does it just calculate nearest
neighbors fixed points in phase space according to some
metric? The latter seems more straightforward conceptually
and mathematically but is awkward to achieve as a
spontaneous algorithm arising from evolution. How would
you keep track of all the fixed points, from which you
would then try to look up the nearest neighbor? Basins of
attraction may more complicated and interesting shapes
which may be important. Perhaps some of you will succeed
in answering this question with future research.
- References:
- "Neural networks and physical systems with
emergent collective computational abilities" by
J. J. Hopfield, Proc. Natl. Acad. Sci 79, 2554-2558
(1982)
- "Unlearning has a stabilizing effect in collective memories"
J. J. Hopfield, D. I. Feinstein, and R. G. Palmer, Nature 304,
158-159 (1983)
- Thomas K. Landauer, J. of Cognitive Psychology,
1988, "How Much Do People Remember? Some Estimates of
the Quantity of Learned Information in Long-Term
Memory".
- "Neural Networks" by Simon Haykin (Macmillan, New York,
1994); reasonably up-to-date and readable textbook about
mathematical neural networks:
REVIEW OF IMPORTANT RECENT POINTS:
- Finished our discussion about Hopfield networks. Discussed
fact that N-neuron network can store about 0.15N memories
before convergence becomes slow or many spurious memories
start to appear.
- For a N-dimensional flow X'=F(X), a (N-1)-dimensional
Poincare map is obtained by using the flow to take one
point to a successive point on some (N-1)-dimensional
surface (e.g., a hyperplane) that intersects some bounded
orbit of the flow. There is an infinity of Poincare maps
corresponding to many different ways of choosing the
Poincare section or surface but they all have the same
dynamics: they are all periodic, quasiperiodic, or chaotic
together.
FRACTALS AND FRACTAL DIMENSIONS
- I now turn to the next two topics of the course:
fractals and fractal dimensions. Fractals themselves could
(and often do) take up an entire semester of its own
because of the wide-spread appearance of fractals in
nature and in many artificial phenomena. Here, I address
mainly the parts of the subject that help to understand
the complexity of a dynamical flow.
- The big picture is this. A bounded orbit of some
dynamical dissipative system traces out some kind of
attractor in phase space, after transients have
decayed. It is possible both theoretically and empirically
to assign a number to the attractor---its fractal
dimension D---that indicates its "complexity". The
dimension D is often not an integer, hence the term
"fractal" meaning a dimension with non-zero fractional
part. A simple mathematical theorem says that any set of
fractal dimension D must be contained in a phase space of
at least dimension Ceiling[D] (smallest integer larger
than D), and so gives a lower bound for the smallest
possible phase space that could generate the observed
dynamics. Chaotic systems are typically defined to be
deterministic systems with D <= 5, i.e., with modest
fractal dimension. This number D is a natural way to
estimate the complexity of a dynamical system via time
series measurements.
- A final piece in the puzzle is that one can recover
fractal dimensions of attractors from empirical time
series WITHOUT KNOWING THE DYNAMICAL SYSTEM. This amazing
result holds under general and weak mathematical
assumptions, e.g., no experimental noise, a deterministic
origin of the signal, and smooth nonlinear transformations
of a given signal. As usual with mathematical theorems,
there are many delicacies in extracting fractal dimensions
from actual time series. You explore some of these
subtleties in the last homework assignment.
- Our discussion of fractal dimensions of sets and
of attractors will complete an important issue raised
earlier in the course: how to distinguish time series that
have broad-band power spectra. We saw before that
power-spectra lose their usefulness when there are
broad-band features, nothing substantial seems to change
in the spectrum as parameters are varied. Unlike power
spectra, the fractal dimension (provided that one can
calculate it reliably) does vary with parameters and often
in an interesting way.
WHAT IS A FRACTAL SET?
- Fractals are sets of points that are SELF-SIMILAR
or SCALE-INVARIANT (physicist's language). Little pieces
of a fractal set, when magnified, look identical under
rotation and translation to the original set, i.e., they
have no characteristic length scale.
- A circle is NOT fractal since any small piece,
when magnified, is an arc and not a circle. Atoms and
people are not fractal, they have a characteristic length
scale. A finite set of points can not be fractal since
magnifying the vicinity of any one point does not
reproduce the original set. A single point, a line, and a
plane are fractal but have a boring geometric structure
(integer dimension).
- Essence of quantitative definition of a fractal
involves the concept of SCALING, a specific term
introduced by physicists when developing a theory of
equilibrium phase transitions but long understood by many
people over the centuries. Thus a fractal is a set such
that some scalar attribute, e.g., the volume or mass
contained in some ball centered on a point in the fractal,
varies as a POWER LAW as the radius of the ball diminishes
to zero. Nearly anytime a power-law behavior is observed
physically between two quantities, say y = c x^b for
constants c and b, there is likely a fractal involved in
the physics.
EXAMPLES:
- BOX-COUNTING DIMENSION of a set S: carve all of space up
into non-overlapping cubes of size eps and let N(eps) be
the number of boxes that contain at least one point of
the set. Then the box-counting dimension is defined by
the following scaling relation.
Limit_{eps -> 0} N(eps) propto eps^{-D}
- CAPACITY C: somewhat harder to calculate is the capacity
of a set S. Instead of carving space into preexisting
boxes, independent of any set S under consideration, now
let N(eps) denote the MINIMUM number of boxes of radius
eps needed to cover the set. Then the scaling exponent
N(eps) -> eps^{-C} as eps -> 0 defines the capacity if
this limit exists. For many simple sets, the capacity
equals the box-counting dimension but this is rarely
true for strange attractors and natural fractals.
- CORRELATION EXPONENT nu: the box-counting dimension and the
capacity definitions don't take into account the
frequency of visiting different parts of phase space. It
turns out that strange attractors are typically quite
singular sets from a dynamical point of view in that
nearly all the time of a given orbit is given in a small
fraction of the attractor, i.e., the different parts of
the attractor are not visited with equal probability. A
still different dimension---called historically the
"correlation exponent" and denoted by the lower case
Greek letter nu---comes from asking how many points of a
given set are within distance eps of each other:
N(eps) = "# of pairs of points closer than eps"
This is a measure-based estimate of the fractal
dimension and is often the easiest and most robust to
compute numerically since it automatically takes into
account the frequency of visitation of the set.
- There is an uncountable infinity of other fractal
dimensions that can be defined. It turns out that all
the fractal dimensions are related to a special function
D_q (D subscript q) where q is a real number. Thus D_0
is the capacity of a set, D_1 is the so-called
information dimension of a set, and D_2 is the
correlation exponent of the set. One can prove an
inequality that
D_q >= D_q' for q' >= q.
Sets for which D_q is not a constant are called
MULTIFRACTALS and this is the most common case in
nature.
- True mathematical fractals can not exist in nature,
for the simple reason that quantum mechanics provides a
cut-off in length scales of order the Bohr radius (about
one Angstrom). The best one can say of a natural system
is that it exhibits approximate fractal scaling over some
RANGE of length or time scales.
- Fractals can also be statistically but not
geometrically self-similar in that magnified pieces have
identical statistical properties as the original
piece. This is the case for the path traced by a particle
undergoing Brownian diffusion. The dollar-yen exchange
rates were found by you to be statistically fractal, an
extremely interesting result.
TWO MATHEMATICAL EXAMPLES OF FRACTALS
- Decimation construction: two-third's Cantor set
obtained by taking unit interval and recursively wiping
out middle third open interval. This reduces length by
1/3 at each step, so length of limiting set must be
zero. Yet there are uncountably many points in the Cantor
set, since these points can be put in a one-to-one
correspondence with all decimal real numbers in based
three whose digits only has 0's and 2's (no 1's). This set
is self-similar, any small segment of this set can be
magnified to obtain a set that is identical to the
original one. We will see that one of the fractal
dimensions for this set, the capacity, has the value
D=ln(2)/ln(3) approx 0.63093, so this set is closer to
being a line (D=1) than a point (D=0).
CHALLENGE: To test your understanding of the Cantor set
geometry, find a point in the middle-third Cantor set that is
NOT a boundary point for one of the closed intervals used in
taking the limit to construct the set.
- Similarly the Koch snowflake is a limiting set
obtained by inserting line segments recursively into an
equilateral triangle. This leads to a bounded object of
finite area but infinite length, of fractal dimension
(capacity) D=ln(4)/ln(3)=1.262. . You should be able to
show that the set obtained from the unit triangle (sides
1) must lie in a square of sides (2/3) Sqrt[3], so total
area must be less than 4/3.
FRACTALS IN NONLINEAR DYNAMICS
- Attractors of bounded flows and maps are often
fractal. NOTE: Strange attractors, defined to have
sensitive dependence on initial conditions, are NOT
necessarily fractal. Thus the logistic map has a fractal
but non-strange attractor for the accumulation point of
period doublings near r=3.7.
- Basins of attraction are often fractal, e.g.,
for the cubic roots of unity under Newton's numerical
algorithm. For example from engineering literature, see
Moon and Li, Physical Review Letters 55(14), 1439-1442
(1985), of periodically forced particle in two-well
potential.
- Set of parameter values that lead to given
behavior (e.g., periodic limit cycle) are often fractal.
- These three common occurrences of fractal sets
explain why traditional perturbative analytical approaches
are doomed to fail in developing theories of nonlinear
equations. One can not capture infinite details of
fractals in smooth expressions. This does NOT mean
analytical analysis is hopeless, just that a completely
different and nontraditional approach is required.
PHYSICAL EXAMPLES OF FRACTALS
- It is a marvelous and not fully understood fact that
fractals are common in many aspects of nature, a
point most forcefully put forward by Benoit Mandelbrot
of IBM. Some examples:
- mixing of fluids (ink in glycerine)
- clouds (D approx 2.5)
- turbulent fluids (constant-velocity-component surfaces)
- surfaces of mountains
- seashores
- clusters of stars in sky, clusters of galaxies
- branching in biological systems (arteries, lungs, brain, trees)
clearly arises from need to maximize ratio of surface area to volume,
i.e., transport.
- earthquakes (Gutenberg-Richter law of intensities)
- Fractal star distributions: check out the MPEG movie
available at the site:
http://oposite.stsci.edu/pubinfo/education/amazing-space/hdf/intro13.htm
which shows successive magnifications of a part of the sky
made possible by the power of the Hubble space
telescope. This movie gives a keen sense of the remarkable
distribution of stars throughout the universe.
MECHANISM FOR MAKING FRACTALS: MIXING BY STRETCHING AND FOLDING
- A very important and applied example of fractals
arises from the literal MIXING of two fluids by mechanical
stirring. If the fluids are not immiscible (e.g., oily ink
and water), one gets intricate folded sheets of a fractal
structure. What is extremely important is that the mixing
is not uniform no matter how long one mixes, there are
many thin regions where no mixing occurs. The fractal
dimension gives a measure of the efficiency of mixing.
- Strange attractors are fractal orbits that arise
from mixing in phase space, a consequence of boundedness,
dissipation, and sensitive dependence on initial
conditions. Dissipation implies that phase-space volumes
shrink to zero volume while s.i.c. implies that the volume
contraction is not uniform, some parts stretch greatly
while others contract greatly. Boundedness implies folding
over and over again of phase-space volumes.
PHYSICAL MECHANISM FOR MAKING FRACTALS: DIFFUSION LIMITED AGGREGATION (DLA)
- Particularly elegant example of fractal since a physical
mechanism is known that is trivial: random walk a point on a
lattice (any kind) and let it 'stick' if it comes in contact
with a seed point. Repeat for large numbers of points. Obtain
wispy structure of dimension about 1.7, independent of the
lattice geometry (e.g., square or hexagonal) or of the random
number generator. This is the same dimension observed in many
experiments, e.g., in electrochemical deposition to grow metal
crystals at an electrode immersed in a solution.
- DLA fractals have odd property that their densities
DECREASE as more mass is added. Most objects
have constant density as they get bigger, e.g.,
a glass of water or a bar of steel. To see this
- Each particle has mass 1, total mass is N particles
- Fractal implies that N proportional to L^D, i.e.,
number of particles grows with length with fractal
dimension D.
- Actual volume is L^E, E is dimension of space, with E > D.
- Then density = L^D / L^E = L^(D-E) which
decreases with increasing size L of system.
- DLA fractals are relevant to:
- electrochemical deposition
- thin-film morphology
- dendritic solidification
- dielectric breakdown
- viscous fingering.
THE BIG MYSTERY: WHY FRACTALS?
- The mystery is why fractal structures occur. In most cases the
fractal structure has never been predicted a priori, and is not
understood a posteriori. The presence of fractals means that
the physics is scale invariant over many length or time scales,
but why this property should hold is not understood in any deep
way. This is a crucial unsolved mathematical problem. It is
irritating that we can't prove that a fractal set exists even
for the simplest case of DLA.
HOW TO CHARACTERIZE FRACTAL SETS? USE DYNAMICAL INVARIANTS
- Visual insight of fractal sets is limited to phase spaces
of dimension E <= 3, which is too restrictive. We need
more quantitative tools for extracting properties of
fractal sets.
- The ability to recover some information about attractors
by embedding a time series (or several time series) raises
the mathematical question about what information is
preserved. Experiments generally measure complicated
functions of the original observables, e.g., a light beam
may enter a fluid, be scattered, be focused, then
digitized to get a time series. This means that any useful
quantity to characterize a time series or the
reconstructed attractor should be INVARIANT under general
smooth NONLINEAR transformations.
- This rules out common concepts like positions or
heights of peaks in a power spectrum, the size of a set,
the period of an orbit, since these are all distorted
substantially by measurement.
- Instead, one must study more subtle properties of sets.
One simple property that should be preserved is the
topological structure of Poincare sections. Thus general
smooth mappings should not change a periodic orbit
(consisting of a finite number of points) into a
nonperiodic orbit, e.g., a continuous curve as occurs for
a torus.
- The simplest invariant is the DIMENSION of the set of
points constituting the attractor in phase space. There
are many ways of defining dimension of a set, but they all
involve studying how some LOCAL property of the set varies
as a function of the "diameter" of some ball. Most
dimensions are clearly invariant since they are based on
counting how many points are in a given ball. This number
does not change when the ball and any points inside this
ball are mapped to new points by smooth arbitrary
nonlinear transformations.
- Another property is how nearby points separate over
time. If the nonlinear transformation is continuous,
clearly points that stay close in time should stay close
in time under arbitrary transformations of
orbits. Further, the rate at which points separate should
be independent of the coordinate system. This observation
leads to the idea of LYAPUNOV EXPONENTS, which
characterize dynamical properties of flows.
- We have already discussed the first and largest Lyapunov
exponent, the average rate of separation of nearby
orbits. One can also consider groups of three points in a
plane, four points in a volume, etc., where the points
start very close to each other. This leads us to define
the average growth rate of lines, areas, volumes, etc.,
from which we can define further exponents. A system
living in a N-dimensional phase space has N Lyapunov
exponents, lambda[i]. There is a rather extraordinary and
beautiful formula that gives a fractal dimension in terms
of the Lyapunov exponents. Because of this formula, the
Lyapunov exponents are considered the more fundamental of
the invariants of a set.
FRACTAL DIMENSIONS OF A SET
- The common concept of dimensionality, as number of
terms in vector needed to locate point in space, can be
generalized to non-integer values. A set is mathematically
called FRACTAL if this generalized dimension is not an
integer, i.e., it is greater than the usual definition.
- More precisely, your familiar concept of dimension is what
mathematicians call the TOPOLOGICAL DIMENSION, which is
recursively defined. A point has dimension zero by
definition. A line or any one-dimensional set is such that
removing a point breaks the set into two distinct
pieces. A surface or any two-dimensional set is broken
into two pieces by removing some one-dimensional set,
e.g., a line, and so on. Topological dimension can be
tricky to apply, e.g., what is the topological dimension
of a set of lines or of a line and a square?
- CAPACITY, also called the BOX-DIMENSION, is obtained
by estimating how the number of boxes of size epsilon in
phase space (or Euclidean space) needed to cover all
points in the set increases (scales) as the size of the
boxes decreases. One defines:
C = Limit[ Log[N[eps]] / Log[1/eps] , eps -> 0 ]
if this limit exists. More intuitively, for many
sets, N[eps] has a POWER-LAW SCALING form as eps -> 0:
N[eps] approx a eps^{-C}
where a is a constant or a slowly varying function of eps.
The exponent C is the usual Euclidean dimension for
familiar sets (a point, a line, a plane, a circle,
a sphere) but generalizes to more interesting sets.
- The capacity is a weak summary of the properties of a
set. Just as lines can be of all different shapes and
curves, but all with dimension 1, there is an infinity
of different fractal sets that all have the same fractal
dimension.
- Analytical example of capacity: the Cantor set, the
Koch snowflake, and the set of points {1/i: i >= 1}.
These have capacities of ln(2)/ln(3), ln(4)/ln(3), and 1/2
respectively. As an example, consider how many boxes of
size eps are needed to cover the Cantor set up to M
iterations of the constructive algorithm. The first step
creates N(eps)=2 pieces of size eps=1/3. The second step
creates four pieces N(eps)=4 of size eps=(1/3)^2. After M
recursions, we have N(eps)=2^M and eps=(1/3)^M. We then
try to take the limit of M -> Infinity in the definition
of the capacity:
D = Limit[ eps -> 0, Log[ N[eps] ] / Log[ 1/eps] ]
= Limit[ M -> Infinity, M Log[2] / (M Log[3])
= Log[2] / Log[3]
approx 0.63093...
You hopefully see the general trick. We have not been
rigorous in this argument in that we have not shown that
this limit is the same as the limit of the MINIMUM number
of boxes of size eps needed to cover the Cantor set, some
more work is needed. The answer remains correct.
- The crucial defining property of any fractal is that some
quantity varies as a POWER-LAW under some scaling of
size. Power-laws of any kind are the essential
characteristic of scale-invariant, hence fractal,
behavior. For example, we can define another fractal
dimension, the CORRELATION DIMENSION, by defining some new
quantity C(eps) to be the total number of points in a set
that are closer than eps. Then if C(eps) acts as a power
law for small eps, i.e., if
D = Limit[ Log[ C(eps) ] / Log[ eps ] ]
exists as eps -> 0, we again get a fractal dimension,
which differs in general from the capacity.
- Similarly, you could try looking at the smallest distance
between points in a set as a function of the number
of points in the set.
- In the homework, I use still another variation of a
fractal dimension, the CLUSTER dimension of a set. This
is determined as a kind of inverse of the correlation
dimension. Instead of counting how many pairs of points
lie within distance eps (fixing a distance and then
counting), one determines the average radius of a ball
(centered on a given point of the set) that contains N
points (fixing a number and then choosing a typical
distance). Instead of C(eps), we end up with R(N). This
now behaves as a power law as the radius goes to zero, and
the scaling index gives a dimension that is generally
different than both the capacity and the correlation
dimensions.
- Mathematically, there is no deep significance between the
various (infinitely many) definitions of dimension.
HOWEVER, numerically, there can be quite large and
practical differences in the effort needed to estimate a
particular dimension. The capacity and other geometric
dimensions (Hausdorff, potential, etc.) do not converge
rapidly enough for time series associated with strange
attractors and are not practical for numerical studies. A
discussion is given in H. S. Greenside et al, Physical
Review A Vol. 25, 3453 (1982).
A brief explanation is as follows. An orbit fills out a
set of points (the attractor) in the limit of infinite
time. However, the orbits does not fill out this set
uniformly, it visits parts of the set rarely, other parts
of the set more frequently. One can mathematically show
(see analytical example in Farmer, Ott, Yorke paper) that
attractors are in fact singular in the sense that the
orbit spends most of its time visiting a vanishingly small
part of the set. For this reason, dimensions such as the
capacity are difficult to apply since one must integrate
for extremely long periods to count the boxes that contain
parts of the attractor that are visited quite rarely. The
correlation dimension takes this probability of visitation
into account automatically and converges faster.
NUMERICAL ESTIMATES OF FRACTAL DIMENSIONS
- The power-law functional form suggests how to identify
fractal dimensions numerically. One plots some log-log
plot such as
Log[10,N[eps]] versus Log[10,1/eps] ,
for the capacity, or
Log[10,C[eps]] versus Log[eps] ,
for the correlation dimension and then one looks for
linear behavior in the limit of eps -> 0. This leads to
all kinds of numerical subtleties, since this limit can
not be taken with a finite amount of data. In practice,
one plots the local slope of this log-log plot:
D[log(N(eps))]/D(log(1/eps)) versus log(1/eps) ,
where the local slope can be estimated by simple finite
differences:
D[log(N(eps))]/D(log(1/eps)) approx
( log( N(eps2) ) - log( N(eps1) ) ) /
( log( 1/eps2 ) - log( 1/eps1 ) ) ,
where eps1 and eps2 are successive values of size for
which the box count N(eps) are determined. Wherever a
power-law form is obeyed, this local slope becomes a
horizontal plateau and one can read directly off the
vertical axis the capacity of the set.
- Dimensions based on invariant measures of flows and maps:
correlation, cluster, information, and an infinity of
others. If these measure-based dimensions are not all the
same, the fractal is called a MULTIFRACTAL. Multifractals
have different scaling exponents in different parts of the
attractor, they are not homogeneous like the Cantor set or
Koch snowflake.
- Lyapunov dimension, based on Lyapunov exponents. This is
best way to calculate fractal dimensions when the
dynamical equations are known, but can't be used with just
experimental data.
ESTIMATING FRACTAL DIMENSIONS OF TIME SERIES:
- So far, we have assumed that we know a flow or map and
that we know the set of points filled out by an orbit in
the limit of infinite time. In an experiment or
numerically, we may never know the underlying equations or
phase space, and can only measure time series representing
some complex combination (projection) of the phase-space
orbit.
- Key point of this whole business: attractors can be
reconstructed from single time series and the phase space
dimension of the original flow or map is given
approximately by the fractal dimension of this
attractor. Thus one can estimate how many variables must
be used in modeling an experiment or computer simulation
WITHOUT KNOWING THE ORIGINAL DYNAMICAL SYSTEM. This
provides a powerful empirical test for chaos, whether a
given time series is produced by a low-dimensional
attractor.
- What makes this work are two simple observations. If a set
B contains a set A, then dim(A) <= dim(B). Thus any phase
space B must be at least the dimension of any attractor
lying in that space. The other observation is that dim(A)
= dim(F(A)) for arbitrary smooth nonlinear maps F(X) of
E-dim space to E-dim space. Thus dimensions are preserved
under general experiments and changes of coordinates.
- This result has stimulated hundreds of papers in different
disciplines that have tried to estimate the fractal
dimension of various time series, with the goal of
discovering whether there is substantially less complexity
than originally assumed.
- Procedure to extract a fractal dimension proceeds as
follows. First, one embeds data of one-dimensional time
series in spaces of increasing dimension, E, using some
choice of delay time. The dimension E is called the
embedding dimension, and must be systematically increased.
- For each choice of embedding dimension E, estimate fractal
dimension, D[E]. Best not to use definition of capacity,
but rather definition based on measure, e.g., pointwise
dimension or some average, e.g., the correlation exponent
or cluster dimension.
- Fractal dimension D[E] obtained by plotting log-log plot
and looking for straight lines. Alternatively, better
obtained by plotting derivative (slope) of log-log plot
directly, shows fine structure. This curve generally shows
a lot of structure, which makes the determination of a
scaling (power law) region quite difficult or
ambiguous. Even for a 'simple' low-dimensional set like
that generated by the Henon map, one finds that the slope
versus log plot does not become perfectly flat in the
limit of infinitely many data points because of the
singular measure of the Henon set.
- Assuming the ambiguities of the scaling region are
resolved (or accepted) in some way, one then plots D[E]
versus E. If Df is the actual fractal dimension, one
expects:
D[E] = E for E < Df
D[E] = Df for E > Df
Thus easy to read off fractal dimension from this
plot.
FINITE-DIMENSIONAL STOCHASTIC TIME SERIES:
- WARNING. Unfortunately a finite fractal dimension does
NOT indicate by itself that the dynamics is deterministic,
although it is true that infinity- distributed sequences
have infinitely large fractal dimensions. See the
discussion by A. R. Osborne and A. Provenzale, "Finite
correlation dimension for stochastic systems with
power-law spectra", Physica 35D, 357-381 (1989).
- Stochastic time series can have finite fractal
dimensions, e.g., Brownian motion (D=2) or, more
generally, fractional Brownian motion, which can have
arbitrarily large fractal dimensions with D>1. See
Mandelbrot's book or Feder's book for elementary
discussions.
- Possible to have short-time fractal (nondifferentiable)
structure that fools algorithms based on long-time
recurrence. But only known examples have power-law power
spectra so are easily identified.
AMOUNT OF DATA NEEDED GROWS EXPONENTIALLY WITH D ==> D <= 7
- For any definition of dimension, the amount of data needed
for a numerical dimension estimate grows EXPONENTIALLY
rapidly with the dimension of the attractor. This is
easily seen for the capacity, where the number of boxes of
a given size eps needed to cover a set grows as Exp[ -
log(eps) C ] with increasing C. This means that as a set
becomes higher and higher dimensional, it takes more and
more points to determine this high-dimensional structure.
- In all studies to date, there is a casual consensus that a
fractal dimension D >= 7 can not be estimated accurately
since the amount of data needed in a time series would be
tens or hundreds of millions of points, too much. This
restriction on dimension estimates does not hold for the
so-called Lyapunov dimension, but this can ONLY be
calculated if the underlying equations are known.
- This should not be too surprising. A given orbit traces
out an attractor in the limit of infinite time. If the
attractor has high dimension, one has to follow the
attractor longer and longer (more and more points) to
generate the points that effectively flesh out the
high-dimensional structure. Thus think of an embedded
quasiperiodic time series tracing out a donut-shaped
attractor. For short times, the orbit is just a line. For
longer times, the orbit starts coming arbitrarily close to
points that have been generated before and it is precisely
these near repeats separated by large intervals of time
that slowly flesh out the two-dimensional structure of the
attractor.
- What happens if one does not have enough data, e.g., what
happens if one tries to estimate fractal dimension of
10-dimensional attractor given just 500 points?
Unfortunately, one still often observes reasonably
unambiguous scaling regions that are invariant under
increasing embedding dimension. This is a simple artifact:
a small set of points at some resolution will always look
approximately like a line or a plane, etc. Thus one has to
work very hard and systematically to verify that a
dimension estimate is reasonable.
NUMERICAL RESOLUTION GROWS RAPIDLY WITH REYNOLDS NUMBER:
- The Kolmogorov theory of homogeneous turbulence makes an
interesting prediction about the the effective fractal
dimension or size of phase space needed to simulate fluid
turbulence. A nice summary is given on page 408 of the
Manneville book, with further references.
- In many fluid experiments, there is a flow with a
characteristic velocity V (e.g., the speed of fluid
entering a pipe or the mean flow of air past an airplane
wing). The fluid flows past an object with characteristic
length L (e.g., the diameter of the pipe or wing). The
fluid is generally viscous, with a kinetic viscosity nu
with dimensions meter^2/sec. From these three variables,
one can form a single dimensionless combination known as
the Reynolds number:
Re = V L / nu .
This ratio measures the degree that the system is driven
out of equilibrium, with the numerator being proportional
to the driving (velocity) and the denominator giving the
dissipation.
- The Kolmogorov theory predicts that the fractal dimension
of the flow is related to the Reynolds number Re in the
following way:
D = a Re^(9/4) ,
it grows as the 2.25 power of the Reynolds number. This
is bad news for fundamental fluid simulations since
typical Reynolds numbers may be of order 10^5 so D may be
of order 10^10 within two orders of magnitude, a huge
number. In fact, the above relation is an overestimate but
the exact relation is not understood.
- Numerical simulations of pdes such as Navier-Stokes
begin by discretizing the spatial domain. In finite
differences, e.g., one assumes that all quantities
(pressure, temperature, velocity fields) are known only at
discrete mesh points. The numerical simulation introduces
a phase space that is the total number of spatial degrees
of freedom, e.g., Nx Ny Nz where Nx, Ny, Nz are the number
of spatial mesh points in the three-dimensional
domain. Since any attractors of the numerical simulation
live in this phase space, one must have:
Nx Ny Nz >= Re^(9/4) ,
which provides a lower bound on the numerical resolution.
In fact, one can not really choose the spatial resolution
close to the minimal number of modes indicated by the
fractal dimension, so numerical simulation must have even
more degrees of freedom to resolve correctly a turbulent
flow.
- This leads to an interesting but relatively unexplored
use of fractal dimensions, as ways of verifying that
adequate numerical resolution is used. Just as one
increases the embedding dimension E of a time series until
the fractal dimension becomes independent of E, one should
increase spatial resolution in a pde solver until the
fractal dimension of the dynamics becomes independent of
that resolution. At that point, the dynamics are correctly
resolved. An extremely interesting but unsolved question
is how to design numerical simulations that have the
smallest numerical mesh consistent with the fractal
dimension of the dynamics. A solution to this problem
could increase computational capabilities by orders of
magnitude.
UNSOLVED QUESTIONS CONCERNING FRACTALS:
- Not understood why fractals occur so often in nature,
e.g., there is not yet a good theory that explains why
diffusion limited aggregation leads to fractals or, even
assuming fractal behavior, why the fractal dimension is
about 1.7. Similarly, it is poorly understood why galaxies
are organized in a fractal. Simple mechanisms are known
but not proofs why these mechanisms work.
- Not understood yet how to extract actual few degrees of
freedom implied by small fractal dimension, i.e., how to
obtain dynamical equations. Fractal dimension is only a
very simple way to summarize a complex situation.
- Not understood yet how fractal dimension varies generally
with bifurcation parameters e.g., can it jump rapidly from
small to large values?
- Very expensive and difficult to calculate fractal
dimensions for higher dimensional fractals since amount of
points in S must grow exponentially with the dimension
D. Are there better ways to study the data? What other
invariants are experimentally accessible?