Re: Re: N-Dimensional line
- To: mathgroup at smc.vnet.net
- Subject: [mg22807] Re: [mg22798] Re: [mg22785] N-Dimensional line
- From: Bojan Bistrovic <bojanb at python.physics.odu.edu>
- Date: Fri, 31 Mar 2000 01:01:11 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
> The built-in function Fit does what you want. > > Let the dimension of the space be > > dim = 5; > > The variables to define the space are > > vars = ToExpression[Table["x" <> ToString[k], {k, dim}]]; > > The functions to define the line are > > funcs = Join[{1}, vars]; > > To generate test data > > coef = 10*Table[Random[], {dim + 1}]; > > Clear[f]; > > f[x_List] := First[coef] + (Rest[coef].x) /; Length[x] == dim > > data = Join[#, {f[#]}] & /@ Table[Random[], {dim + 2}, {dim}]; > > To Fit a line to the generated test data > > Fit[data, funcs, vars] > > 3.9635834515677497 + 4.0699930252986904*x1 + > 5.430269807608436*x2 + 5.302060419354726*x3 + > 8.430356457003278*x4 + 1.4150730899130353*x5 > > Verifying that this is the same line as that used to generate the test data > > % - f[vars] // Chop > > 0 > > Bob Hanlon > > In a message dated 3/25/2000 5:11:54 AM, birdy00 at bu.edu writes: > > >Does anyone have a reference and/or Mathematica code for finding the > >line > >in N-dimensional space that minimizes the distances from this line to a > >set of > >points in this space? > > > Bob, your code fits data to a N-1 dimensional plane (5-plane) in N dimensional space (6-space), not a curve. Curve is defined as a set of points parametrized with ONE parameter and you have 5. Curve in N-dimensional space can (sometimes) be parametrized as either x[N]=f[x[1],x[2], ... ,x[N-1] ] or as {x[1]=f1[t],x[2]=f2[t], ... , x[N]=fN[t]} where t is a parameter. Virgil, finding the curve that "minimizes the distances from this line to a set of points in this space" isn't trivial and depends on the type of curve you want. Here's an example of least square fit to a straight line in 2 dimensions. Least square fit minimizes the square of differences between the function values (f[x] or y) and the point values x[n]: chi=Sum[( y[x[k], a[0],a[1]] - y[k] )^2, {k,1,m}] where 'm' is the number of data points and the line is parametrized as y[x]=a[0]+ a[1]*x ; if this is to be minimal, then derivatives {D[chi,a[1]], D[chi,a[2]]} must be equal to zero. This gives you the set of equations {Sum[(a[0]+ a[1]*x[k] - y[k] )* x[k], {k,1,m}]==0 , Sum[(a[0]+ a[1]*x[k] - y[k] ), {k,1,m}]==0 } or {a[0]*Sum[x[k],{k,1,m}] + a[1]*Sum[x[k]^2,{k,1,m}] == Sum[y[l]*x[k],{k,1,m}] , a[0]* m + a[1]* Sum[x[k],{k,1,m}] == Sum[ y[k], {k,1,m}] } This is the system of two equations for two variables which can be solved to give you a[0] and a[1]. This is what built-in function "Fit" does. Generalization of this to finding the straight 2-dimensional plane in 3 dimensional space is trivial: you parameterize the plane as z[x,y]=a[0] + a[1]*x +a[2]*y and repeat the procedure. the same holds for other types of curves in 2D space; you parameterize the parabola as y[x]=a[0]+ a[1]*x + a[2]*x^2 and the rest is the same. "Fit" can do all of this. The same procedure can be extended to a line in N+1-dimensional space: if you have a set of 'm' points { {x1[1],x2[1],...,x(N+1)[1]},{x1[2],x2[2],..., x(N+1)[2]}, ... , {x1[M],x2[m],...,x(N+1)[m]} } and you parameterize your curve with a set of functions x(N+1)= f[x1,x2, ... , xN, a[1],a[2],...,a[n]] where {a[1],...,a[n]} are some (unknown) parameters, then you require that chi= Sum[ (f[x1,x2,...,xN,a[1],a[2],...,a[n]] - x(N+1) ) /. {x[1]->x1[k], x[2]-> x2[k], ..., xN->xN[k]} ,{k,1,M}] is minimal by requiring that all derivates { D[chi,a[1]], D[chi,a[2]] ,..., D[chi,a[N]]} are equal to zero. Then you repeat the procedure for solving the equations for a[1], ..., a[n]. Note a few things: 1.) x(N+1) DOESN'T mean x*(N+1); it is just a way ugly example of variable name. it means: ToSymbol[StringJoin["x",ToString[N+1]] 2.) The number of parameters 'n' has nothing to do with the dimension of space 'N', but it DOES to be smaller then the number of points 'm+1'. 3.) unless your curve is polynomial of the first order in parameters a[k], you're not going to get a linear system of equations for a[k] and that's the only one that's easily solvable. Nonlinear systems might not have solutions at all. 4.) the choice of minimizing the square of distance in (N+1)-th dimension is arbitrary. there's no reason for choosing that particular dimension. You could choose any dimension, or sum of any or all dimensions as a thing you want to minimize. different choice will give you different curve. 5.) the choice of the SQUARE of distance that for minimizing is also arbitrary. in principle you could choose any positive definitive function and minimize that (for example, Exp[ (f[x1,x2,...,xN,a[1],a[2],...,a[n]] - x(N+1) )/Lambda] where Lambda is some scale. This will again give you different curve. The main reason (in my opinion anyway) for choosing the square is that you get a nice linear system of equations for your parameters which is easily solvable (The "real" reason has something to do with Gauss distribution of errors, if I remember my statistics classes correctly). 6.) all of the above assumes you CAN parameterize your curve as x(N+1)=f[x1,x2,...,xN,a[1],a[2],...,a[n]]. Your curve could be something that either can't be expressed this way, or even if it can be done in principle, you might not know how to do it in real life. This will also depend a lot on the coordinate system you use (Cartesian vs. polar for example). The bottom line is: there's no general procedure. You have to assume a form of curve in advance and then find the parameters that fit your data best. I hope this helps a little. For more information ask someone in Mathematics and statistics department at your university. Bye, Bojan -- --------------------------------------------------------------------- Bojan Bistrovic, bojanb at jlab.org Old Dominion University, Norfolk VA & Jefferson Lab, Newport News, VA ---------------------------------------------------------------------