Re: RE: Re: Newbie Question
- To: mathgroup at smc.vnet.net
- Subject: [mg32819] Re: [mg32776] RE: [mg32728] Re: [mg32686] Newbie Question
- From: Sseziwa Mukasa <mukasa at jeol.com>
- Date: Thu, 14 Feb 2002 01:43:38 -0500 (EST)
- Organization: JEOL (USA) Ltd.
- References: <200202091011.FAA16310@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
"Wolf, Hartmut" wrote: > > Sseziwa, > > this is the right way to deal with the problem, I think. Here > just a little modification of this idea, and an other one following > a suggestion from Andrzej Kozlowski: > > > In[1]:= ll = Table[Random[Real, {0, 100}], {300}]; > > your suggestion: > > In[2]:= > f[x_, m_Integer] := Block[{a = Drop[x, m]}, (a - Drop[x, -m])/a] > In[3]:= > r1 = Table[f[ll, i], {i, Length[ll] - 1}]; > > my variant: > > In[4]:= > r2 = (#1 - #2)/#1 & @@@ > Drop[NestList[{Drop[First[#], 1], Drop[Last[#], -1]} &, {ll, ll}, > Length[ll] - 1], 1]; > > In[5]:= r1 == r2 > Out[5]= True > > The idea was: dropping the first or last element of a list > is faster than dropping half of it. > I've tried it myself and this seems to be the case which is unsurprising, Drop is probably an order n operation on a list. However, I would not have expected this to be the case using packed arrays but they don't change the timing at all. > > Another idea (trying to use efficient list operations): > > <deleted> Timing however excludes this variant; compare the other two: > > > In[9]:= ll = Table[Random[Real, {0, 100}], {4000}]; > > In[16]:= > (r = Table[f[ll, i], {i, Length[ll] - 1}]); // Timing > Out[16]= > {20.57 Second, Null} > In[17]:= Remove[r]; > > In[18]:= > (r = (#1 - #2)/#1 & @@@ > NestList[{Drop[First[#], 1], Drop[Last[#], -1]} &, > {Drop[ll, 1], Drop[ll, -1]}, > Length[ll] - 2]); // Timing > Out[18]= > {13.359 Second, Null} > In[19]:= Remove[r]; > > Your variant however is more economic with memory, and I was > not able to run the test with a list of 10000. (400 MHz P II > Notebook, 192 MB real memory). I estimate that 10000 will take > about 3 minutes. This would mean 10^8 data will at least take > several days, perhaps too much to make any predictions. > For an input list of size n the resulting data set is going to be of size 0.5 * (n^2 - n) each element of which is the result of a subtract and divide operation. Assuming a rate of one operation per processor cycle it will take more than 570 days to process your entire data set on a 400MHz processor regardless of the language used. Do you really need the percent differences at all lags? > > Anyways I agree with you that this is not the right task for > Mathematica, producing a vaste of numbers of little information > content. From which you finally will have to read off something, > what? how? > If Mathematica had a file seek operator to arbitrarily write to any position in a file it probably wouldn't be much worse than any other programming language at solving this problem. Being able to write to an arbitrary position in a file is also a file system dependent function and I'm not sure if Windows filesystems have that capability or that Mathematica exposes it either. However since Mathematica can only write linearly I suppose you could do the following: f[infile_, outfile_, n_, m_] := Block[{a, b, instrma = OpenRead[infile], instrmb = OpenRead[infile], outstrm = OpenWrite[outfile], rem}, Do[Skip[instrmb, Real, lag]; Do[a = ReadList[instrma, Real, m]; If[lag > m, b = ReadList[instrmb, Real, m], SetStreamPosition[instrmb, StreamPosition[instrma]]; b = Flatten[{a[[Range[lag + 1, m]]], ReadList[instrmb, Real, lag]}]]; Write[outstrm, #] & /@ (b - a)/b, {Quotient[n - lag, m]}]; rem = Mod[n - lag, m]; If[rem != 0, a = ReadList[instrma, Real, rem]; If[lag > rem, b = ReadList[instrmb, Real, rem], b = Flatten[{a[[Range[lag + 1, rem]]], ReadList[instrmb, Real, lag]}]]; Write[outstrm, #] & /@ (b - a)/b]; SetStreamPosition[instrma, 0]; SetStreamPosition[instrmb, 0], {lag, n - 1}]; Close[instrma]; Close[instrmb]; Close[outstrm]] where infile and outfile are the names of the input and output files respectively, n is the length of the data set and m is the maximum length of any list to be used in the calculation. It is assumed that m < n. The above program can be easily modified to compute the percent differences over a range of lags other than 1..n-1, but it is by no means fast. I ask again though, do you really need the percent difference at all lags? Also, is the percent difference what you are finally looking for or are there other operations you will apply to the result? If so it would probably be best to actually compute the percent difference only at the time at which the value is needed. Then you'll be getting more useful work done per iteration than by computing the differences first and then applying you other operations. You could use the Print function to periodically return the results of your calculation if you are looking for value or set of values of particular interest. You could then abort your calculation when your criteria are met. Other than that I see little chance of analyzing your data set without reducing its size somehow, it is simply too large. Regards, Sseziwa
- References:
- RE: Re: Newbie Question
- From: "Wolf, Hartmut" <Hartmut.Wolf@t-systems.com>
- RE: Re: Newbie Question