MathGroup Archive: April 2011 [00237]

[Date Index] [Thread Index] [Author Index]
Re: fitting a parametric curve
To: mathgroup at smc.vnet.net
Subject: [mg118043] Re: fitting a parametric curve
From: Ray Koopman <koopman at sfu.ca>
Date: Sat, 9 Apr 2011 17:59:11 -0400 (EDT)
References: <inmg5q$pvu$1@smc.vnet.net>
On Apr 8, 1:14 am, Sebastian Hofer <sebho... at gmail.com> wrote:
> I'm currently working on a problem, which I find interesting both from a 
conceptual and a programing point of view. I present my solution below, I'm
 pretty sure there is a lot of room for improvement. Specifically I would be interested how to optimize the performance of the algorithm, but I'm also happy with general suggestions (if there is a better way to do it,... Maybe this is even a trivial problem and it is already solved in Mathematica, but I wasn't able to find it).
> The problem is the following: I have a function which is given in a parametric form as f[t]={fx[t],fy[t]} and depends on some parameters c={c1,c2,...,cn}. I now want to fit f to a list of (experimental) data points {f[t0],f[t1],...} to determine c. What makes this interesting is that the values t0,t1,... are not known.
> My idea to tackle this was the following:
> - For a given tuple c, sample f at some (appropriately chosen) points s0, s1,...
> - calculate the minimal total distance between ft={f[t0],f[t1],...} and fs={f[s0],f[s1],...}, i.e. for each point in ft find the closest one from fs and sum up the distances.
> - Minimize the resulting function with respect to c
>
> Implementing this resulted in the code below. It is separated into the following functions:
> - NAntiderivativeInverse calculates the curve length as a function of the curve parameter t.
> - GenerateFitData samples f[t] to create the vector ft. To do a decent sampling (=equidistant sampling with respect to curve length) first NAntiderivativeInverse is called.
> - TotalLeasSquares calculates the minimal total distance of the sampled and the experimental data set.
> - ParametricFit essentially puts all of the above together and passes the
 result of TotalLeasSquares to NMinimize/FindMinimum.
>
> The algorithm works so far but it is pretty slow if you go to a decent number of sample points. The 2 functions which do all the work are GenerateFitData and TotalLeastSquares (these are the ones called many times consecutively during minimization). So, the obvious question is how to improve their performance. Specifically:
> - GenerateFitData seems pretty straight forward, I don't think that there is lot of room for optimization there. I tried to Compile[] it, but I'm never quite sure how to do this correctly (i.e. in a way which gives me a speedup) so this may not help much.
> - TotalLeastSquares: I guess there might be a Million different ways to do this, if anyone can come up with a faster way I would be glad! If not: what are feasible options to optimize this with respect to compilation, outsourcing it to external programs (C,...)?
>
> As an aside: Is there a better way to calculate the curve length as a function of t numerically? I couldn't think of any. This is not a big issue speedwise though, as I do that for one specific parameter set c only. Of course the resulting function changes for different c, but I expect that this does not have a large influence on the result of the optimization when I go to a large number of sampled points.
>
> Also, I need to find a way to judge the quality of the fit and to incorporate experimental errors. This is of course a very general problem, still if someone has insights or pointers on that, I would be happy to hear them!
>
> Looking forward to all comments!
>
> Best regards,
> Sebastian
>
> (*
>         calculate inverse curvelength, i.e. InverseLength[L]:=Solve[Integrate[df x[t]^2+dfy[t]^2,{t,0,T}]==L,T].
>         So if I want a point half-way through the curve (at L/2), InverseLength[L/2] gives me the corresponding value for the
>         parameter t. Note that this is normalized sucht that L=1 (so half-way po int would be InverseLength[1/2]).
> *)
> NAntiderivativeInverse[funcn_,range_List,options:OptionsPattern[]]:= Block[
>                 {
>                         stepsize,totallength,tdata,
>                         x1=0,
>                         steps=OptionValue[SamplePoints]
>                 },
>                 stepsize=Abs[Subtract@@Rest@range]/steps;
>                 totallength=NIntegrate[funcn,range];
>
>                 (*slice the full interval into parts, integrate each and add up the results*)
>                 tdataTable[{NIntegrate[funcn,Evaluate@{First@range,x1,x0}]/totallength,-x1+(x1=x0)},
>                                 {x0,First@Rest@range,Last@Rest@range,stepsize}];
>                 Return@Interpolation@Table[Total@tdata[[1;;n]],{n,1,Length@tdata}]
>         ];
> Options[NAntiderivativeInverse]=Join[{SamplePoints->300},Options[NIntegrate]];
>
> (*
>         sample f equidistantly and return list of points fs={f[s0],f[s1],...}.
>         fscale is the output of NAntiderivativeInverse
> *)
> GenerateFitData[fx_,fy_,fscale_,parameters_,options:OptionsPattern[]]:=
>         Module[
>                 {samplepoints=OptionValue[SamplePoints]},
>                 Return@Table[{fx[fscale@x,Sequence@@parameters],fy[fscale@x,Sequence@@parameters]},{x,0,1,1/samplepoints}]
>         ];
> Options[GenerateFitData]={SamplePoints->300};
>
> (*
>         for each point in data (experimental data) find the closest point in fdata(sampled data)
>         calculate the distance and sum up for all points
> *)
> TotalLeastSquares[data_,fdata_]:=Total[Min/@Outer[EuclideanDistance,data,fdata,1]];
>
> ParametricFit[sdata_List,{sfunc_List,constraints__},fitparameters_List,varrange_List,options:OptionsPattern[]]:=
>         Block[
>         {
>                 fitparametername,init,initpoints,mf,args,
>                 fx,fy,dfx,dfy,
>                 fscale,ferror,fe,gfd,
>                 minres,
>                 varlist=Prepend[First/@fitparameters,First@varrange],
>                 method=OptionValue[FitMethod],
>                 compile=OptionValue[Compile],
>                 curvelengthoptions=OptionValue[CurveLengthOptions]
>         =09
>         },
>         (*extract the names of the fitted variables*)
>         fitparametername=Switch[#,List[__List],First/@#,List[__],#]&@fitparamete rs;
>         (*use FindMinimum or NMinimize*)
>         mf=Switch[method,"FindMinimum",FindMinimum,"NMinimize",NMinimize,_,FindMinimum];
>
>         (*determine initial points for calculation of curvelength*)
>         initpoints=FilterRules[curvelengthoptions,InitialPoints];
>         init=Switch[If[initpoints=!={},initpoints,Automatic],
>                         List[_Rule..],ReplaceAll[fitparametername,initpoints],
>                         List[_?AtomQ..],initpoints,
>                         _,Switch[#,(*Automatic*)
>                                 List[__List?(Length@#==2&)],Last/@#,
>                                 List[__List?(Length@#==3&)],Map[(First@#+Last@#)/2&,Rest/@#]]&@fitparameters
>                 ];
>
>         (*f[t]={fx[t],fy[t]}*)
>         fx=Function[Evaluate@varlist,Evaluate@First@sfunc];
>         fy=Function[Evaluate@varlist,Evaluate@Last@sfunc];
>
>         (*gradient of f is needed to calculate the curve length*)
>         dfx=Function[Evaluate@varlist,Evaluate@D[fx@@varlist,First@varrange]];
>         dfy=Function[Evaluate@varlist,Evaluate@D[fy@@varlist,First@varrange]];
>
>         (*calculate inverse curve length as a function of t*)
>         fscale=NAntiderivativeInverse[
>                 Sqrt[dfx[First@varrange,Sequence@@init]^2+dfy[First@varrange,Sequence@@init]^2],
>                 varrange,Evaluate@FilterRules[{curvelengthoptions},Options[NAntiderivativeInverse]]];
>
>         (*generate sampled data for given parameters*)
>         gfd=Switch[compile,
>                 True,Compile[Evaluate@Map[{#,_Real}&,fitparametername],
>                         Evaluate@GenerateFitData[fx,fy,fscale,fitparametername,
>                                 Evaluate@FilterRules[{options},Options[GenerateFitData]]]],
>                 False,Function[Evaluate@Map[#&,fitparametername],
>                         Evaluate@GenerateFitData[fx,fy,fscale,fitparametername,
>                                 Evaluate@FilterRules[{options},Options[GenerateFitData]]]]];
> =09
>         (*error function - to be minimized*)
>         ferror[pars__]:=TotalLeastSquares[sdata,gfd@pars];
>         fe[pars__]:=ferror[pars]/;And@@NumericQ/@{pars};
>
>         (*minimize using FindMinimum/NMinimize*)
>         If[constraints===Null,
>                 args=fe[Sequence@@fitparametername],
>                 args={fe[Sequence@@fitparametername],constraints}];
>         minres={Sow@#1,#2}&@@mf[args,Evaluate@fitparameters];
>         Return@Last@minres;
> ];
> ParametricFit[data_List,func_List,fitparameters_List,varrange_List,options:= OptionsPattern[]]:=
>         ParametricFit[data,{func,Null},fitparameters,varrange,options];
> Options[ParametricFit]={FitMethod->"FindMinimum",Compile->True,CurveLengthOptions->{}};
>
> ParametricFit::usage =
>   "ParametricFit[data, {fx, fy}, {{a, a0}, ...}, {t, t0, t1}, \
> FitMethod \[Rule] \"FindMinimum\"]
>   ParametricFit[data, {{fx, fy}, constraints}, {{a, a0}, ...}, {t, \
> t0, t1}, FitMethod \[Rule] \"FindMinimum\"]
>   ParametricFit[data, {{fx, fy}, constraints}, {{a, a0}, ...}, {t, \
> t0, t1}, FitMethod \[Rule] \"NMinimize\"]
>
>   Notes:
>    - First argument data is a list of tuples, i.e. \
> {{x1,y1},{x2,y2},...}
>    - The third argument defines the fit parameters, see the \
> documentation of FindMinimum/NMinimize for usage instructions.
>    - The last argument {t, t0, t1} refers to the parametric \
> dependence of the fit functions, i.e. fx = fx[t], fy = fy[t]; fx and \
> fy are (more or less) uniformly sampled over the interval {t0,t1}. \
> The number of sample points can be adjusted by using \
> CurveLengthOptions\[Rule]{SamplePoints\[Rule]n} (default is n=300).
>
>   Options:
>    - FitMethod: see above
>    - CurveLengthOptions: List of options used for calculation the \
> curve length
>         - SamplePoints: Number of points sampled for interpolation of \
> curve length.
>         - InitialPoints: Specifies values for fit parameters for which \
> curve length is evaluated. Can be one of the following
>                 - {u\[Rule]u0,v\[Rule]v0,...}
>                 - {u0, v0,...} in order of appearance
>                 - Automatic
>    - All other (appropriate) options are passed to \
> FindMinimum/NMinimize.
>   ";

Are the t-values ordered (t0 < t1 < ... )? Are fx and fy monotone in
the region of interest, so that (in principle) they can be inverted?

Do you really need to sample the points?
How many points do you have, and how many parameters?

TLS assumes the errors are random, uncorrelated, and homoscedastic,
both from point to point and in both directions for each point.
You can modify the fitting to handle various departures from those
assumptions, but is this the same data whose errors you said in a
previous post (Feb 14) were deterministic, due to failures in a
numerical minimization routine?
Prev by Date: Re: Managing your notebooks
Next by Date: Re: Managing your notebooks
Previous by thread: fitting a parametric curve
Next by thread: Recognize multiple outputs