MathGroup Archive: March 2000 [00558]

[Date Index] [Thread Index] [Author Index]
Re: Re: N-Dimensional line
To: mathgroup at smc.vnet.net
Subject: [mg22807] Re: [mg22798] Re: [mg22785] N-Dimensional line
From: Bojan Bistrovic <bojanb at python.physics.odu.edu>
Date: Fri, 31 Mar 2000 01:01:11 -0500 (EST)
Sender: owner-wri-mathgroup at wolfram.com
> The built-in function Fit does what you want.
> 
> Let the dimension of the space be
> 
> dim = 5;
> 
> The variables to define the space are
> 
> vars = ToExpression[Table["x" <> ToString[k], {k, dim}]];
> 
> The functions to define the line are
> 
> funcs = Join[{1}, vars];
> 
> To generate test data
> 
> coef = 10*Table[Random[], {dim + 1}];
> 
> Clear[f];
> 
> f[x_List] := First[coef] + (Rest[coef].x) /; Length[x] == dim
> 
> data = Join[#, {f[#]}] & /@ Table[Random[], {dim + 2}, {dim}];
> 
> To Fit a line to the generated test data
> 
> Fit[data, funcs, vars]
> 
> 3.9635834515677497 + 4.0699930252986904*x1 + 
>   5.430269807608436*x2 + 5.302060419354726*x3 + 
>   8.430356457003278*x4 + 1.4150730899130353*x5
> 
> Verifying that this is the same line as that used to generate the test data
> 
> % - f[vars] // Chop
> 
> 0
> 
> Bob Hanlon
> 
> In a message dated 3/25/2000 5:11:54 AM, birdy00 at bu.edu writes:
> 
> >Does anyone have a reference and/or Mathematica code for finding the
> >line
> >in N-dimensional space that minimizes the distances from this line to a
> >set of
> >points in this space?
> >
> 

Bob, your code fits data to a N-1 dimensional plane (5-plane) in N 
dimensional space (6-space), not a curve. Curve is defined as a set of points
parametrized with ONE parameter and you have 5. Curve in N-dimensional space
can (sometimes) be parametrized as either x[N]=f[x[1],x[2], ... ,x[N-1] ] or 
as {x[1]=f1[t],x[2]=f2[t], ... , x[N]=fN[t]} where t is a parameter. 

Virgil, finding the curve that "minimizes the distances from this line to 
a set of points in this space" isn't trivial and depends on the type of 
curve you want. Here's an example of least square fit to a straight line 
in 2 dimensions.  Least square fit minimizes the square of differences 
between the function values (f[x] or y) and the point values x[n]:

chi=Sum[( y[x[k], a[0],a[1]] - y[k] )^2, {k,1,m}]

where 'm' is the number of data points and the line is parametrized as
y[x]=a[0]+ a[1]*x ; if this is to be minimal, then derivatives 
{D[chi,a[1]], D[chi,a[2]]} must be equal to zero. This gives you 
the set of equations 

{Sum[(a[0]+ a[1]*x[k] - y[k] )* x[k], {k,1,m}]==0 ,
 Sum[(a[0]+ a[1]*x[k] - y[k] ), {k,1,m}]==0 }

or

{a[0]*Sum[x[k],{k,1,m}] + a[1]*Sum[x[k]^2,{k,1,m}] ==  Sum[y[l]*x[k],{k,1,m}] ,
 a[0]* m + a[1]* Sum[x[k],{k,1,m}] == Sum[ y[k], {k,1,m}] }

This is the system of two equations for two variables which can be solved to
give you a[0] and a[1]. This is what built-in function "Fit" does.
Generalization of this to finding the straight 2-dimensional plane in 3
dimensional space is trivial: you parameterize the plane as 
z[x,y]=a[0] + a[1]*x +a[2]*y and repeat the procedure. the same holds for other
types of curves in 2D space; you parameterize the parabola as 
y[x]=a[0]+ a[1]*x + a[2]*x^2 and the rest is the same. "Fit" can do all of
this. 

The same procedure can be extended to a line in N+1-dimensional space:
if you have a set of 'm' points

{ {x1[1],x2[1],...,x(N+1)[1]},{x1[2],x2[2],..., x(N+1)[2]}, ... ,
	{x1[M],x2[m],...,x(N+1)[m]} }

and you parameterize your curve with a set of functions

x(N+1)= f[x1,x2, ... , xN, a[1],a[2],...,a[n]]

where {a[1],...,a[n]} are some (unknown) parameters, then you require that

chi= Sum[ (f[x1,x2,...,xN,a[1],a[2],...,a[n]] - x(N+1) ) /.
	{x[1]->x1[k], x[2]-> x2[k], ..., xN->xN[k]} ,{k,1,M}]

is minimal by requiring that all derivates 

{ D[chi,a[1]], D[chi,a[2]] ,..., D[chi,a[N]]} 

are equal to zero. Then you repeat the procedure for solving the equations for
a[1], ..., a[n].

Note a few things: 

1.) x(N+1) DOESN'T mean x*(N+1); it is just a way ugly example of variable 
name. it means: 

ToSymbol[StringJoin["x",ToString[N+1]]

2.) The number of parameters 'n' has nothing to do with the dimension of 
space 'N', but it DOES to be smaller then the number of points 'm+1'. 

3.) unless your curve is polynomial of the first order in parameters a[k],
you're not going to get a linear system of equations for a[k] and that's the
only one that's easily solvable. Nonlinear systems might not have solutions at
all.

4.) the choice of minimizing the square of distance in (N+1)-th dimension is
arbitrary. there's no reason for choosing that particular dimension. You could
choose any dimension, or sum of any or all dimensions as a thing you want to
minimize. different choice will give you different curve.

5.) the choice of the SQUARE of distance that for minimizing is also
arbitrary. in principle you could choose any positive definitive function and
minimize that (for example, 
Exp[ (f[x1,x2,...,xN,a[1],a[2],...,a[n]] - x(N+1) )/Lambda] 
where Lambda is some scale. This will again give you different curve. The main
reason (in my opinion anyway) for choosing the square is that you get a nice
linear system of equations for your parameters which is easily solvable (The
"real" reason has something to do with Gauss distribution of errors, if I
remember my statistics classes correctly).

6.) all of the above assumes you CAN parameterize your curve as
x(N+1)=f[x1,x2,...,xN,a[1],a[2],...,a[n]]. Your curve could be something
that either can't be expressed this way, or even if it can be done in
principle, you might not know how to do it in real life. This will also
depend a lot on the coordinate system you use (Cartesian vs. polar for
example). 

The bottom line is: there's no general procedure. You have to assume a form of
curve in advance and then find the parameters that fit your data best. I hope
this helps a little. For more information ask someone in Mathematics and
statistics department at your university.

Bye, Bojan

--
---------------------------------------------------------------------
Bojan Bistrovic,                                      bojanb at jlab.org
Old Dominion University, Norfolk VA & Jefferson Lab, Newport News, VA
---------------------------------------------------------------------
Prev by Date: Re: Real-time plotting
Next by Date: Re: Trying to define: Fractional Derivatives & Leibniz' display form for output and templates
Previous by thread: Re: N-Dimensional line
Next by thread: Problems with UnitStep and DiracDelta