MathGroup Archive 2004

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Counting Runs

  • To: mathgroup at smc.vnet.net
  • Subject: [mg52170] Re: Counting Runs
  • From: "Carl K. Woll" <carlw at u.washington.edu>
  • Date: Sat, 13 Nov 2004 04:40:22 -0500 (EST)
  • Organization: University of Washington
  • References: <200411040650.BAA18131@smc.vnet.net> <200411050717.CAA06890@smc.vnet.net> <opsg0lmi1fiz9bcq@monster.cox-internet.com> <opsg0pe8woiz9bcq@monster.cox-internet.com> <D9733954-2F93-11D9-85F1-000A95ED10EE@yale.edu> <cmhvag$prj$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

Hi all,

This problem reminded me of one from a couple years ago where we needed to 
count the occurences of a 1 followed by a 0 in a list of 1s and 0s. Knowing 
how many times a 1 is followed by a 0 essentially tells you how many runs of 
1s there are. There the fastest method ended up using Tr and BitXor.

This suggests that we can count the number of runs of any particular integer 
in a sequence of integers if we can convert the sequence to 1s for that 
integer and 0s for everything else. Doing this conversion turned out to be 
troublesome, but I managed to come up with a method that is pretty fast. If 
the sequence is data, then the following expression will have 1s for the 
integer int and 0s for everything else:

Abs@Quotient[#,#+1,1]&@Abs[data-int]

Now that we have an expression of just 1s and 0s, we can count how many runs 
of 1s there are. The following function counts the number of runs of int in 
the sequence data:

runs[int_,data_]:=Module[{modlist},
 modlist=Abs@Quotient[#,#+1,1]&@Abs[data-int];
 Tr[BitXor[modlist,RotateRight[modlist]]]/2+BitAnd[modlist[[1]],modlist[[-1]]]]For a sequence of a million integers, the above runs function can find thenumber of runs of a single integer about 7 to 10 or more times faster (on mymachine) than your hanlonTreat function can find the number of runs of allthe integers, depending on how many different integers are in the sequence.So, if there are fewer than 7 different integers it would be faster to usethe above runs function to find the occurrences of all the runs.It turns out that going from Abs[data-int] which has 0s for every occurenceof int and positive integers for the other integers to 1s for int and 0s foreverything else takes the majority of the time (over 75%). A nice challengewould be to come up with a faster method to convert a sequence of 0s andpositive integers to 0s and 1s, where 1 replaces every positive integer.I'll post this challenge in another thread.Carl Woll"DrBob" <drbob at bigfoot.com> wrote in messagenews:cm!
 hvag$prj$1 at smc.vnet.net...> I've updated my notebook again, under the Run Counts link at:>> http://eclecticdreams.net/DrBob/mathematica.htm>> I'm not sure whether solver performance depends mostly on the number ofruns, or the number of different values in a data list. The two are somewhatinversely related, of course.>> The fastest solvers are brt4 (using Frequencies) and hanlonTreat (hanlon3,with Part instead of Map).>> Bobby>> On Fri, 5 Nov 2004 20:33:23 -0500, János <janos.lobb at yale.edu> wrote:>>> It must be machine or OS dependent.>>>> I re-discovered Hanlon3 method :) and  ran it with Bobby's newest.  I>> don't have Bobby's data so I generated random didgits in the 0-9 range>>>> Here are the results:>>>> In[28]:=>> v = Table[Random[Integer,>>       {0, 9}], {i, 1, 10^7}];>>>> In[29]:=>> Timing[({First[#1],>>       Length[#1]} & ) /@>>     Split[Sort[First /@>>        Split[v]]]]>> Out[29]=>> {35.58*Second, {{0, 898901},>>     {1, 899397}, {2, 901191},>>     {3, 899449},!
  {4, 900824},>>     {5, 900262}, {6, 899338},>>     {7, 900293}, {8, 9
00196},>>     {9, 901311}}}>>>> In[32]:=>> Timing[({First[#1],>>       Length[#1]} & ) /@>>     Split[Sort[Split[v][[All,>>        1]]]]]>> Out[32]=>> {38.67999999999998*Second,>>    {{0, 898901}, {1, 899397},>>     {2, 901191}, {3, 899449},>>     {4, 900824}, {5, 900262},>>     {6, 899338}, {7, 900293},>>     {8, 900196}, {9, 901311}}}>>>> My machine is a 1.25Ghz G4 with 2G Ram and with OSX 10.3.5.>>>> János>> On Nov 5, 2004, at 7:38 PM, DrBob wrote:>>>>> I found an even faster (rather obvious) solution:>>>>>> hanlonTreat[v_] := {First@#, Length@#} & /@ Split@Sort[Split[v][[All,>>> 1]]]>>>>>> It about 80% faster than hanlon4.>>>>>> Bobby>>>>>> On Fri, 05 Nov 2004 17:16:56 -0600, DrBob <drbob at bigfoot.com> wrote:>>>>>>> I timed the posted methods except Andrzej's -- it's the only one that>>>> works only for +1/-1 data -- plus a couple of my own that I haven't>>>> posted. David Park's method seems the same as the fastest method,>>>> hanlon3. I modified all methods to return a !
 pair {x, number of runs>>>> in x} for each x in the data.>>>>>>>> Two of Bob Hanlon's methods beat all the rest of us -- but one of his>>>> is the slowest method, too.>>>>>>>> I've posted a notebook at the Run Counts link at:>>>>>>>> http://eclecticdreams.net/DrBob/mathematica.htm>>>>>>>> Bobby>>>>>>>> On Fri, 5 Nov 2004 02:17:54 -0500 (EST), Selwyn Hollis>>>> <sh2.7183 at misspelled.erthlink.net> wrote:>>>>>>>>> Hi Greg,>>>>>>>>>> The following seems to work pretty well:>>>>>>>>>>    runscount[lst_?VectorQ] :=>>>>>      Module[{elems, flips, counts},>>>>>        elems = Union[lst];>>>>>        flips = Cases[Partition[lst, 2, 1], {x_, y_} /; x =!= y];>>>>>        counts = {#, Count[Most[flips], {#, _}]} & /@ elems;>>>>>        {x1, x2} = Last[flips];>>>>>        counts /. {{x1, y_} -> {x1, y+1}, {x2, y_} -> {x2, y+1}}]>>>>>>>>>> Example:>>>>>>>>>>   Table[Random[Integer, {1, 5}], {20}]>>>>>   runscount[%]>>>>>>>>>>       {2, 2, 3, 1, 3, 2, 2, 3, 1, 1, 2, 3, 1, 1, 3, 1, 1, 2, 2!
 , 2}>>>>>>>>>>       {{1, 4}, {2, 4}, {3, 5}}>>>>>>>>>>>>>>> ----->>>>
> Selwyn Hollis>>>>> http://www.appliedsymbols.com>>>>> (edit reply-to to reply)>>>>>>>>>>>>>>> On Nov 4, 2004, at 1:50 AM, Gregory Lypny wrote:>>>>>>>>>>> Looking for an elegant way to count runs to numbers in a series.>>>>>> Suppose I have a list of ones and negative ones such as>>>>>> v={1,1,1,-1,1,1,1,1,1,-1,-1,-1,-1,1}.>>>>>> I'd like to create a function that counts the number of runs of 1s>>>>>> and>>>>>> -1s, which in this case is 3 and 2.>>>>>>>>>>>> Greg>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -->>> DrBob at bigfoot.com>>> www.eclecticdreams.net>>>>>>>>>> ------------------------------------------------------------------->> János Löbb>> Yale University School of Medicine>> Department of Pathology>> Phone: 203-737-5204>> Fax:      203-785-7303>> E-mail: janos.lobb at yale.edu>>>>>>>>>>>> --> DrBob at bigfoot.com> www.eclecticdreams.net>


  • Prev by Date: Re: Why these definitions are so slow
  • Next by Date: scroll problem with mouse
  • Previous by thread: Re: Re: Re: Counting Runs
  • Next by thread: Re: Counting Runs