MathGroup Archive: February 2002 [00339]

[Date Index] [Thread Index] [Author Index]

RE (2): Position within a list

To: mathgroup at smc.vnet.net
Subject: [mg32949] RE (2): [mg32920] Position within a list
From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
Date: Thu, 21 Feb 2002 02:07:01 -0500 (EST)
Sender: owner-wri-mathgroup at wolfram.com

> -----Original Message-----
> From:	Dana DeLouis [SMTP:ng_only at hotmail.com]
To: mathgroup at smc.vnet.net
> Sent:	Tuesday, February 19, 2002 8:30 AM
> To:	mathgroup at smc.vnet.net
> Subject:	[mg32920] Position within a list
> 
> Hello.  A long time ago someone posted an elegant solution, but I can not
> find it in the Archives.
> Given a list that may have repeating data, this returned the character
> that
> was repeated the most, and its position..
> 
> The example given was a list of the first 10,000 digits of Pi.  I believe
> the answer was that the number 9 was repeated 6 times around position
> 700 or 800.
> The function was very short and elegant.
> I don't believe the Split function was used, but I might be wrong.
> 
> Does anyone know how to do this?  Thank you.
> 
> lst = RealDigits[Pi,10,10000][[1]];
> 
> The list would have...
> {3, 1, 4, 1, 5, 9, 2, 6, etc)
> 
> 
> --
> Dana DeLouis
> Windows XP  & Mathematica 4.1
> = = = = = = = = = = = = = = = = =
> 
[Hartmut Wolf] 

In the evening I got another idea as how to improve the "partition"
solutions. The idea is to use smaller partitions and then fix the partial
solutions:

In[24]:=
Function[{s},
Module[{sameparts, candidates, repetitions, startseq = 0, r = 0, maxreps},
  sameparts = Apply[SameQ, Partition[lst, s, 1], {1}];
  candidates = Flatten[Position[sameparts, True]];
  repetitions = (If[# == startseq + (++r - s), r, 
                        (startseq = #; r = s)] &) /@ 
                candidates;
  maxreps = Max[repetitions];
  Extract[candidates, 
      Position[repetitions, maxreps, {1}, 1] - maxreps + s] + 
        {{0, maxreps - 1}}
] // Timing] /@ Range[2, 7]

>From In[24]:=
Thread::"tdlen" : "Objects of unequal length....cannot be combined."

Out[24]=
{{0.4 Second, {{763, 768}}}, 
 {0.221 Second, {{763, 768}}}, 
 {0.2 Second, {{763, 768}}}, 
 {0.22 Second, {{763, 768}}}, 
 {0.231 Second, {{763, 768}}}, 
 {0.25 Second, {} + {{0, -\[Infinity]}}}}

No longer need you know the exact no of repetions, only avoid to guess too
much.

In this case it was best to search for repetions of 4; then we got only few
candidates at shorter partitions.

The split solution is only marginally better. A second look at this...

 
In[14]:= splitlst = Split[lst]; // Timing
Out[14]= {0.091 Second, Null}

In[15]:= (lenlst = Length /@ splitlst; 
    pos = Position[lenlst, xlen = Max[lenlst], {1}, 1][[1, 1]]) // Timing
Out[15]= {0.11 Second, 695}

In[16]:= Plus @@ Take[lenlst, pos - 1] + {1, xlen} // Timing
Out[16]= {0.01 Second, {763, 768}}

In[17]:= Length[Flatten[Take[splitlst, pos - 1]]] + {1, xlen} // Timing
Out[17]= {0.01 Second, {763, 768}}

...shows that Split is quite fast, a little more time is spent looking at
lengths. Retrieving the original position in lst is fast (even when the
maximal runlength was found deeper in lst); Line 17 is an alternative to
applying Plus, with same (low) costs.

It is a natural thought to avoid the latter and pass through the original
position information (Allen Hayes solution does this), yet it has
repercussions on Split, see:

In[6]:= trlst = Transpose[{lst, Range[10000]}]; // Timing
Out[6]= {0. Second, Null}

In[7]:= splitlst2 = Split[trlst, SameQ[First[#1], First[#2]] &]; // Timing
Out[7]= {0.811 Second, Null}

In[8]:= lenlst2 = Length /@ splitlst2; // Timing
Out[8]= {0.08 Second, Null}

In[9]:= (pos = Position[lenlst2, Max[lenlst2], {1}, 1][[1, 1]]); // Timing
Out[9]= {0.01 Second, Null}

In[10]:= splitlst2[[pos]]
Out[10]=
{{9, 763}, {9, 764}, {9, 765}, {9, 766}, {9, 767}, {9, 768}}

The burden on Split with the test is high, nearly a factor of ten. This
phenomenon is similar to an observation with Sort. This is understandable,
as the test leaves the kernel and reenters the evaluation main loop. However
SameQ and First are first class kernel functions, and should be "compiled"
to minute cpu actions. (Own attempts to use Compile are to no avail.)

Perhaps some time Wolfram will do something on this; improvements to the
finer forms of Split, Sort etc. could enhance the reach of usability of
Mathematica.

In[43]:= $OperatingSystem
         $Version
         $UserName
Out[43]= "WindowsNT"
Out[44]= "4.1 for Microsoft Windows (November 2, 2000)"
Out[45]= "hwolf"

Prev by Date: Re: Numerical Differentiation using Fourier Transform

Next by Date: Using Symbols with Plot

Previous by thread: Re: Getting Coordinates from plot

Next by thread: Using Symbols with Plot