MathGroup Archive: February 2002 [00215]

[Date Index] [Thread Index] [Author Index]

Re: RE: Re: Newbie Question

To: mathgroup at smc.vnet.net
Subject: [mg32819] Re: [mg32776] RE: [mg32728] Re: [mg32686] Newbie Question
From: Sseziwa Mukasa <mukasa at jeol.com>
Date: Thu, 14 Feb 2002 01:43:38 -0500 (EST)
Organization: JEOL (USA) Ltd.
References: <200202091011.FAA16310@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

"Wolf, Hartmut" wrote:

>
> Sseziwa,
>
> this is the right way to deal with the problem, I think. Here
> just a little modification of this idea, and an other one following
> a suggestion from Andrzej Kozlowski:
>
>
> In[1]:= ll = Table[Random[Real, {0, 100}], {300}];
>
> your suggestion:
>
> In[2]:=
> f[x_, m_Integer] := Block[{a = Drop[x, m]}, (a - Drop[x, -m])/a]
> In[3]:=
> r1 = Table[f[ll, i], {i, Length[ll] - 1}];
>
> my variant:
>
> In[4]:=
> r2 = (#1 - #2)/#1 & @@@
>       Drop[NestList[{Drop[First[#], 1], Drop[Last[#], -1]} &, {ll, ll},
>           Length[ll] - 1], 1];
>
> In[5]:= r1 == r2
> Out[5]= True
>
> The idea was: dropping the first or last element of a list
> is faster than dropping half of it.
>

I've tried it myself and this seems to be the case which is unsurprising,
Drop is probably an order n operation on a list.  However, I would not have
expected this to be the case using packed arrays but they don't change the
timing at all.

>
> Another idea (trying to use efficient list operations):
>
> <deleted> Timing however excludes this variant; compare the other two:
>
>
> In[9]:= ll = Table[Random[Real, {0, 100}], {4000}];
>
> In[16]:=
> (r = Table[f[ll, i], {i, Length[ll] - 1}]); // Timing
> Out[16]=
> {20.57 Second, Null}
> In[17]:= Remove[r];
>
> In[18]:=
> (r = (#1 - #2)/#1 & @@@
>           NestList[{Drop[First[#], 1], Drop[Last[#], -1]} &,
>                    {Drop[ll, 1], Drop[ll, -1]},
>                     Length[ll] - 2]); // Timing
> Out[18]=
> {13.359 Second, Null}
> In[19]:= Remove[r];
>
> Your variant however is more economic with memory, and I was
> not able to run the test with a list of 10000. (400 MHz P II
> Notebook, 192 MB real memory). I estimate that 10000 will take
> about 3 minutes. This would mean 10^8 data will at least take
> several days, perhaps too much to make any predictions.
>

For an input list of size n the resulting data set is going to be of size
0.5 * (n^2 - n) each element of which is the result of a subtract and
divide operation.  Assuming a rate of one operation per processor cycle it
will take more than 570 days to process your entire data set on a 400MHz
processor regardless of the language used.  Do you really need the percent
differences at all lags?

>
> Anyways I agree with you that this is not the right task for
> Mathematica, producing a vaste of numbers of little information
> content. From which you finally will have to read off something,
> what? how?
>

If Mathematica had a file seek operator to arbitrarily write to any
position in a file it probably wouldn't be much worse than any other
programming language at solving this problem.  Being able to write to an
arbitrary position in a file is also a file system dependent function and
I'm not sure if Windows filesystems have that capability or that
Mathematica exposes it either.  However since Mathematica can only write
linearly I suppose you could do the following:

f[infile_, outfile_, n_, m_] :=
  Block[{a, b, instrma = OpenRead[infile], instrmb = OpenRead[infile],
      outstrm = OpenWrite[outfile], rem},
    Do[Skip[instrmb, Real, lag];
      Do[a = ReadList[instrma, Real, m];
        If[lag > m, b = ReadList[instrmb, Real, m],
          SetStreamPosition[instrmb, StreamPosition[instrma]];
          b = Flatten[{a[[Range[lag + 1, m]]],
                ReadList[instrmb, Real, lag]}]];
        Write[outstrm, #] & /@ (b - a)/b, {Quotient[n - lag, m]}];
      rem = Mod[n - lag, m];
      If[rem !=  0, a = ReadList[instrma, Real, rem];
        If[lag > rem, b = ReadList[instrmb, Real, rem],
          b = Flatten[{a[[Range[lag + 1, rem]]],
                ReadList[instrmb, Real, lag]}]];
        Write[outstrm, #] & /@ (b - a)/b]; SetStreamPosition[instrma, 0];
      SetStreamPosition[instrmb, 0], {lag, n - 1}]; Close[instrma];
    Close[instrmb]; Close[outstrm]]

where infile and outfile are the names of the input and output files
respectively, n is the length of the data set and m is the maximum length
of any list to be used in the calculation.  It is assumed that m < n.  The
above program can be easily modified to compute the percent differences
over a range of lags other than 1..n-1, but it is by no means fast.

I ask again though, do you really need the percent difference at all lags?
Also, is the percent difference what you are finally looking for or are
there other operations you will apply to the result?  If so it would
probably be best to actually compute the percent difference only at the
time at which the value is needed.  Then you'll be getting more useful work
done per iteration than by computing the differences first and then
applying you other operations.  You could use the Print function to
periodically return the results of your calculation if you are looking for
value or set of values of particular interest.   You could then abort your
calculation when your criteria are met.  Other than that I see little
chance of analyzing your data set without reducing its size somehow, it is
simply too large.

Regards,

Sseziwa

References:
- RE: Re: Newbie Question
  - From: "Wolf, Hartmut" <Hartmut.Wolf@t-systems.com>

Prev by Date: Mathematica, sounds and placing graphics

Next by Date: maximal entropy method

Previous by thread: RE: Re: Newbie Question

Next by thread: Add a line to the top of a comma delimited file?