RE (2): Position within a list
- To: mathgroup at smc.vnet.net
- Subject: [mg32949] RE (2): [mg32920] Position within a list
- From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
- Date: Thu, 21 Feb 2002 02:07:01 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
> -----Original Message----- > From: Dana DeLouis [SMTP:ng_only at hotmail.com] To: mathgroup at smc.vnet.net > Sent: Tuesday, February 19, 2002 8:30 AM > To: mathgroup at smc.vnet.net > Subject: [mg32920] Position within a list > > Hello. A long time ago someone posted an elegant solution, but I can not > find it in the Archives. > Given a list that may have repeating data, this returned the character > that > was repeated the most, and its position.. > > The example given was a list of the first 10,000 digits of Pi. I believe > the answer was that the number 9 was repeated 6 times around position > 700 or 800. > The function was very short and elegant. > I don't believe the Split function was used, but I might be wrong. > > Does anyone know how to do this? Thank you. > > lst = RealDigits[Pi,10,10000][[1]]; > > The list would have... > {3, 1, 4, 1, 5, 9, 2, 6, etc) > > > -- > Dana DeLouis > Windows XP & Mathematica 4.1 > = = = = = = = = = = = = = = = = = > [Hartmut Wolf] In the evening I got another idea as how to improve the "partition" solutions. The idea is to use smaller partitions and then fix the partial solutions: In[24]:= Function[{s}, Module[{sameparts, candidates, repetitions, startseq = 0, r = 0, maxreps}, sameparts = Apply[SameQ, Partition[lst, s, 1], {1}]; candidates = Flatten[Position[sameparts, True]]; repetitions = (If[# == startseq + (++r - s), r, (startseq = #; r = s)] &) /@ candidates; maxreps = Max[repetitions]; Extract[candidates, Position[repetitions, maxreps, {1}, 1] - maxreps + s] + {{0, maxreps - 1}} ] // Timing] /@ Range[2, 7] >From In[24]:= Thread::"tdlen" : "Objects of unequal length....cannot be combined." Out[24]= {{0.4 Second, {{763, 768}}}, {0.221 Second, {{763, 768}}}, {0.2 Second, {{763, 768}}}, {0.22 Second, {{763, 768}}}, {0.231 Second, {{763, 768}}}, {0.25 Second, {} + {{0, -\[Infinity]}}}} No longer need you know the exact no of repetions, only avoid to guess too much. In this case it was best to search for repetions of 4; then we got only few candidates at shorter partitions. The split solution is only marginally better. A second look at this... In[14]:= splitlst = Split[lst]; // Timing Out[14]= {0.091 Second, Null} In[15]:= (lenlst = Length /@ splitlst; pos = Position[lenlst, xlen = Max[lenlst], {1}, 1][[1, 1]]) // Timing Out[15]= {0.11 Second, 695} In[16]:= Plus @@ Take[lenlst, pos - 1] + {1, xlen} // Timing Out[16]= {0.01 Second, {763, 768}} In[17]:= Length[Flatten[Take[splitlst, pos - 1]]] + {1, xlen} // Timing Out[17]= {0.01 Second, {763, 768}} ...shows that Split is quite fast, a little more time is spent looking at lengths. Retrieving the original position in lst is fast (even when the maximal runlength was found deeper in lst); Line 17 is an alternative to applying Plus, with same (low) costs. It is a natural thought to avoid the latter and pass through the original position information (Allen Hayes solution does this), yet it has repercussions on Split, see: In[6]:= trlst = Transpose[{lst, Range[10000]}]; // Timing Out[6]= {0. Second, Null} In[7]:= splitlst2 = Split[trlst, SameQ[First[#1], First[#2]] &]; // Timing Out[7]= {0.811 Second, Null} In[8]:= lenlst2 = Length /@ splitlst2; // Timing Out[8]= {0.08 Second, Null} In[9]:= (pos = Position[lenlst2, Max[lenlst2], {1}, 1][[1, 1]]); // Timing Out[9]= {0.01 Second, Null} In[10]:= splitlst2[[pos]] Out[10]= {{9, 763}, {9, 764}, {9, 765}, {9, 766}, {9, 767}, {9, 768}} The burden on Split with the test is high, nearly a factor of ten. This phenomenon is similar to an observation with Sort. This is understandable, as the test leaves the kernel and reenters the evaluation main loop. However SameQ and First are first class kernel functions, and should be "compiled" to minute cpu actions. (Own attempts to use Compile are to no avail.) Perhaps some time Wolfram will do something on this; improvements to the finer forms of Split, Sort etc. could enhance the reach of usability of Mathematica. In[43]:= $OperatingSystem $Version $UserName Out[43]= "WindowsNT" Out[44]= "4.1 for Microsoft Windows (November 2, 2000)" Out[45]= "hwolf"