Re: Counting Runs
- To: mathgroup at smc.vnet.net
- Subject: [mg52170] Re: Counting Runs
- From: "Carl K. Woll" <carlw at u.washington.edu>
- Date: Sat, 13 Nov 2004 04:40:22 -0500 (EST)
- Organization: University of Washington
- References: <200411040650.BAA18131@smc.vnet.net> <200411050717.CAA06890@smc.vnet.net> <opsg0lmi1fiz9bcq@monster.cox-internet.com> <opsg0pe8woiz9bcq@monster.cox-internet.com> <D9733954-2F93-11D9-85F1-000A95ED10EE@yale.edu> <cmhvag$prj$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Hi all, This problem reminded me of one from a couple years ago where we needed to count the occurences of a 1 followed by a 0 in a list of 1s and 0s. Knowing how many times a 1 is followed by a 0 essentially tells you how many runs of 1s there are. There the fastest method ended up using Tr and BitXor. This suggests that we can count the number of runs of any particular integer in a sequence of integers if we can convert the sequence to 1s for that integer and 0s for everything else. Doing this conversion turned out to be troublesome, but I managed to come up with a method that is pretty fast. If the sequence is data, then the following expression will have 1s for the integer int and 0s for everything else: Abs@Quotient[#,#+1,1]&@Abs[data-int] Now that we have an expression of just 1s and 0s, we can count how many runs of 1s there are. The following function counts the number of runs of int in the sequence data: runs[int_,data_]:=Module[{modlist}, modlist=Abs@Quotient[#,#+1,1]&@Abs[data-int]; Tr[BitXor[modlist,RotateRight[modlist]]]/2+BitAnd[modlist[[1]],modlist[[-1]]]]For a sequence of a million integers, the above runs function can find thenumber of runs of a single integer about 7 to 10 or more times faster (on mymachine) than your hanlonTreat function can find the number of runs of allthe integers, depending on how many different integers are in the sequence.So, if there are fewer than 7 different integers it would be faster to usethe above runs function to find the occurrences of all the runs.It turns out that going from Abs[data-int] which has 0s for every occurenceof int and positive integers for the other integers to 1s for int and 0s foreverything else takes the majority of the time (over 75%). A nice challengewould be to come up with a faster method to convert a sequence of 0s andpositive integers to 0s and 1s, where 1 replaces every positive integer.I'll post this challenge in another thread.Carl Woll"DrBob" <drbob at bigfoot.com> wrote in messagenews:cm! hvag$prj$1 at smc.vnet.net...> I've updated my notebook again, under the Run Counts link at:>> http://eclecticdreams.net/DrBob/mathematica.htm>> I'm not sure whether solver performance depends mostly on the number ofruns, or the number of different values in a data list. The two are somewhatinversely related, of course.>> The fastest solvers are brt4 (using Frequencies) and hanlonTreat (hanlon3,with Part instead of Map).>> Bobby>> On Fri, 5 Nov 2004 20:33:23 -0500, János <janos.lobb at yale.edu> wrote:>>> It must be machine or OS dependent.>>>> I re-discovered Hanlon3 method :) and ran it with Bobby's newest. I>> don't have Bobby's data so I generated random didgits in the 0-9 range>>>> Here are the results:>>>> In[28]:=>> v = Table[Random[Integer,>> {0, 9}], {i, 1, 10^7}];>>>> In[29]:=>> Timing[({First[#1],>> Length[#1]} & ) /@>> Split[Sort[First /@>> Split[v]]]]>> Out[29]=>> {35.58*Second, {{0, 898901},>> {1, 899397}, {2, 901191},>> {3, 899449},! {4, 900824},>> {5, 900262}, {6, 899338},>> {7, 900293}, {8, 9 00196},>> {9, 901311}}}>>>> In[32]:=>> Timing[({First[#1],>> Length[#1]} & ) /@>> Split[Sort[Split[v][[All,>> 1]]]]]>> Out[32]=>> {38.67999999999998*Second,>> {{0, 898901}, {1, 899397},>> {2, 901191}, {3, 899449},>> {4, 900824}, {5, 900262},>> {6, 899338}, {7, 900293},>> {8, 900196}, {9, 901311}}}>>>> My machine is a 1.25Ghz G4 with 2G Ram and with OSX 10.3.5.>>>> János>> On Nov 5, 2004, at 7:38 PM, DrBob wrote:>>>>> I found an even faster (rather obvious) solution:>>>>>> hanlonTreat[v_] := {First@#, Length@#} & /@ Split@Sort[Split[v][[All,>>> 1]]]>>>>>> It about 80% faster than hanlon4.>>>>>> Bobby>>>>>> On Fri, 05 Nov 2004 17:16:56 -0600, DrBob <drbob at bigfoot.com> wrote:>>>>>>> I timed the posted methods except Andrzej's -- it's the only one that>>>> works only for +1/-1 data -- plus a couple of my own that I haven't>>>> posted. David Park's method seems the same as the fastest method,>>>> hanlon3. I modified all methods to return a ! pair {x, number of runs>>>> in x} for each x in the data.>>>>>>>> Two of Bob Hanlon's methods beat all the rest of us -- but one of his>>>> is the slowest method, too.>>>>>>>> I've posted a notebook at the Run Counts link at:>>>>>>>> http://eclecticdreams.net/DrBob/mathematica.htm>>>>>>>> Bobby>>>>>>>> On Fri, 5 Nov 2004 02:17:54 -0500 (EST), Selwyn Hollis>>>> <sh2.7183 at misspelled.erthlink.net> wrote:>>>>>>>>> Hi Greg,>>>>>>>>>> The following seems to work pretty well:>>>>>>>>>> runscount[lst_?VectorQ] :=>>>>> Module[{elems, flips, counts},>>>>> elems = Union[lst];>>>>> flips = Cases[Partition[lst, 2, 1], {x_, y_} /; x =!= y];>>>>> counts = {#, Count[Most[flips], {#, _}]} & /@ elems;>>>>> {x1, x2} = Last[flips];>>>>> counts /. {{x1, y_} -> {x1, y+1}, {x2, y_} -> {x2, y+1}}]>>>>>>>>>> Example:>>>>>>>>>> Table[Random[Integer, {1, 5}], {20}]>>>>> runscount[%]>>>>>>>>>> {2, 2, 3, 1, 3, 2, 2, 3, 1, 1, 2, 3, 1, 1, 3, 1, 1, 2, 2! , 2}>>>>>>>>>> {{1, 4}, {2, 4}, {3, 5}}>>>>>>>>>>>>>>> ----->>>> > Selwyn Hollis>>>>> http://www.appliedsymbols.com>>>>> (edit reply-to to reply)>>>>>>>>>>>>>>> On Nov 4, 2004, at 1:50 AM, Gregory Lypny wrote:>>>>>>>>>>> Looking for an elegant way to count runs to numbers in a series.>>>>>> Suppose I have a list of ones and negative ones such as>>>>>> v={1,1,1,-1,1,1,1,1,1,-1,-1,-1,-1,1}.>>>>>> I'd like to create a function that counts the number of runs of 1s>>>>>> and>>>>>> -1s, which in this case is 3 and 2.>>>>>>>>>>>> Greg>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -->>> DrBob at bigfoot.com>>> www.eclecticdreams.net>>>>>>>>>> ------------------------------------------------------------------->> János Löbb>> Yale University School of Medicine>> Department of Pathology>> Phone: 203-737-5204>> Fax: 203-785-7303>> E-mail: janos.lobb at yale.edu>>>>>>>>>>>> --> DrBob at bigfoot.com> www.eclecticdreams.net>
- References:
- Counting Runs
- From: Gregory Lypny <gregory.lypny@videotron.ca>
- Re: Counting Runs
- From: Selwyn Hollis <sh2.7183@misspelled.erthlink.net>
- Counting Runs