Re: Counting Runs

*To*: mathgroup at smc.vnet.net*Subject*: [mg52194] Re: Counting Runs*From*: DrBob <drbob at bigfoot.com>*Date*: Sun, 14 Nov 2004 04:30:56 -0500 (EST)*References*: <001601c4c8d4$6aecc3e0$6400a8c0@Main> <opsheakrjtiz9bcq@monster.cox-internet.com> <006b01c4c998$29b9a220$6400a8c0@Main>*Reply-to*: drbob at bigfoot.com*Sender*: owner-wri-mathgroup at wolfram.com

> I think Abs, Tr, and BitXor are very fast because they are probably > optimized for integer input and the use of packed arrays. I don't think an optimization for integers can be effective if functions take time to CHECK whether the array contains only integers. Is that precomputed for packed arrays, somehow? Bobby On Sat, 13 Nov 2004 10:47:59 -0500, Carl K. Woll <carl at woll2woll.com> wrote: > Hi Bob, > > I think Abs, Tr, and BitXor are very fast because they are probably > optimized for integer input and the use of packed arrays. On the other hand, > I doubt that there are any such optimizations for integers or packed arrays > for the functions Part, Split and Sort. > > At any rate, Andrzej Kozlowsky has kindly provided an improvement, so that > the following runs function is almost twice as fast as my previous one. > > runs[int_,data_]:=Module[{modlist}, > modlist=Sign[Abs[data-int]]; > Tr[BitXor[modlist,RotateRight[modlist]]]/2+1-BitOr[modlist[[1]],modlist[[-1]]]]Carl Woll----- Original Message -----From: "DrBob" <drbob at bigfoot.com>To: "Carl K. Woll" <carlw at u.washington.edu>Sent: Saturday, November 13, 2004 3:44 AMSubject: [mg52194] Re: Counting Runs> Carl,>> Here are timings for substeps of "runs" and "hanlonTreat":>> data = sample[10^6];> Timing[{Timing[s = data - 1; Subtract],> Timing[s = Abs[s]; Abs],> Timing[(Abs[Quotient[#1, #1 + 1, 1]] & )[s];> Abs[Quotient]]}]> Timing[(Abs[Quotient[#1, #1 + 1, 1]] & )[Abs[data - 1]];> modlist]> Timing[runs[1, data]; runs]> Timing[{Timing[s = Split[data]; Print[Length[s]]; Split],> Timing[s = s[[All,1]]; Part], Timing[s = Sort[s]; Sort],> Timing[s = Split[s]; Split],> Timing[({First[#1], Length[#1]} & ) /@ s; Map]}]> Timing[hanlonTreat[data]; hanlonTreat]>> {0.11 Second,{{0.032 Second,Subtract},{0. Second,Abs},{> 0.078 Second,Abs[Quotient]}}}> {0.125 Second,modlist}> {0.125 Second,run! s}! > 687139> ! To: mathgroup at smc.vnet.net > {0.656 Second,{{0.25 Second,> Split},{0.156 Second,Part},{0.187 Second,> Sort},{0.063 Second,Split},{0. Second,Map}}}> {0.641 Second,hanlonTreat}>> Within "hanlonTreat", it appears that Split, Part, and Sort are eachslower than "runs" in its entirety, even though Part and Split are operatingon a shorter list than "data". Meanwhile, "runs" is making five passesthrough "data" or an equally long list.>> Part takes 2/3 as much time as Sort, but Abs, Tr, BitXor, and RotateRighttake almost no time at all.>> A lot of this seems very counterintuitive.>> If WRI made it clear what functions are well-optimized and which are not,it would be very helpful.>> Bobby>> On Fri, 12 Nov 2004 11:26:48 -0500, Carl K. Woll <carlw at u.washington.edu>wrote:>>>>>>> Hi all,>>>> This problem reminded me of one from a couple years ago where we neededto>> count the occurences of a 1 followed by a 0 in a list of 1s and 0s.Knowing>> how many times a 1 is followed by a 0 essentially tells you how man! y ! ! > runsof>> 1s there are. There the fastest method ended up using Tr and > BitXor.>>>> This suggests that we can count the number of runs of any particularinteger>> in a sequence of integers if we can convert the sequence to 1s for that>> integer and 0s for everything else. Doing this conversion turned out tobe>> troublesome, but I managed to come up with a method that is pretty fast.If>> the sequence is data, then the following expression will have 1s for the>> integer int and 0s for everything else:>>>> Abs@Quotient[#,#+1,1]&@Abs[data-int]>>>> Now that we have an expression of just 1s and 0s, we can count how manyruns>> of 1s there are. The following function counts the number of runs of intin>> the sequence data:>>>> runs[int_,data_]:=Module[{modlist},>> modlist=Abs@Quotient[#,#+1,1]&@Abs[data-int];>>Tr[BitXor[modlist,RotateRight[modlist]]]/2+BitAnd[modlist[[1]],modlist[[-1]]]]For a sequence of a million integers, the above runs function can findthenumber of runs of a single integer about 7 to 10 or more times faster (onmymachine) than your h! an! ! > lonTreat function can find the number of runs ofallthe integers, depending on how many different integers are in thesequence.So, if there are fewer than 7 different integers it would be fasterto usethe above runs function to find the occurrences of all the runs.Itturns out that going from Abs[data-int] which has 0s for every occurenceofint and positive integers for the other integers to 1s for int and 0sforeverything else takes the majority of the time (over 75%). A nicechallengewould be to come up with a faster method to convert a sequence of0s andpositive integers to 0s and 1s, where 1 replaces every positiveinteger.I'll post this challenge in another thread.Carl Woll"DrBob"<drbob at bigfoot.com> wrote in messagenews:!> cm!>> hvag$prj$1 at smc.vnet.net...> I've updated my notebook again, under theRun Counts link at:>> http://eclecticdreams.net/DrBob/mathematica.htm>> I'mnot sure whether solver performance depends mostly on the number ofruns, orthe number of different values ! in! ! > a data list. The two are somewhatinverselyrelated, of course.>> The f > astest solvers are brt4 (using Frequencies) andhanlonTreat (hanlon3,with Part instead of Map).>> Bobby>> On Fri, 5 Nov 200420:33:23 -0500, János <janos.lobb at yale.edu> wrote:>>> It must be machine orOS dependent.>>>> I re-discovered Hanlon3 method :) and ran it with Bobby'snewest. I>> don't have Bobby's data so I generated random didgits in the0-9 range>>>> Here are the results:>>>> In[28]:=>> v =Table[Random[Integer,>> {0, 9}], {i, 1, 10^7}];>>>> In[29]:=>>Timing[({First[#1],>> Length[#1]} & ) /@>> Split[Sort[First /@>>Split[v]]]]>> Out[29]=>> {35.58*Second, {{0, 898901},>> {1, 899397}, {2,901191},>> {3, 899449!> },!>> {4, 900824},>> {5, 900262}, {6, 899338},>> {7, 900293}, {8, 9>> 00196},>> {9, 901311}}}>>>> In[32]:=>> Timing[({First[#1],>>Length[#1]} & ) /@>> Split[Sort[Split[v][[All,>> 1]]]]]>>Out[32]=>> {38.67999999999998*Second,>> {{0, 898901}, {1, 899397},>>{2, 901191}, {3, 899449},>> {4, 900824}, {5, 900262},! >>! ! > {6, 899338},{7, 900293},>> {8, 900196}, {9, 901311}}}>>>> My machine is a 1.25Ghz G4with 2G Ram and with OSX 10.3.5.>>>> János>> On Nov 5, 2004, at 7:38 PM,DrBob wrote:>>>>> I found an even faster (rather obvious) solution:>>>>>>hanlonTreat[v_] := {First@#, Length@#} & /@ Split@Sort[Split[v][[All,>>>1]]]>>>>>> It about 80% faster than hanlon4.>>>>>> Bobby>>>>>> On Fri, 05Nov 2004 17:16:56 -0600, DrBob <drbob at bigfoot.com> wrote:>>>>>>> I timed theposted methods except Andrzej's -- it's the only one that>>>> works only for+1/-1 data -- plus a couple of my own that I haven't>>>> posted. DavidPark's method seems the same as the fastest method,>>>> hanlon3. I modifiedall methods to return !> a !>> pair {x, number of runs>>>> in x} for each x in the data.>>>>>>>> Two ofBob Hanlon's methods beat all the rest of us -- but one of his>>>> is theslowest method, too.>>>>>>>> I've posted a notebook at the Run Counts linkat:>>>>>>>> http://eclecticdreams.net/DrBob/mathematica! .h! ! > tm>>>>>>>>Bobby>>>>>>>> On Fri, 5 Nov 2004 02:17:54 -0500 (EST), Selwy > n Hollis>>>><sh2.7183 at misspelled.erthlink.net> wrote:>>>>>>>>> Hi Greg,>>>>>>>>>> Thefollowing seems to work pretty well:>>>>>>>>>> runscount[lst_?VectorQ]:=>>>>> Module[{elems, flips, counts},>>>>> elems =Union[lst];>>>>> flips = Cases[Partition[lst, 2, 1], {x_, y_} /; x=!= y];>>>>> counts = {#, Count[Most[flips], {#, _}]} & /@elems;>>>>> {x1, x2} = Last[flips];>>>>> counts /. {{x1,y_} -> {x1, y+1}, {x2, y_} -> {x2, y+1}}]>>>>>>>>>> Example:>>>>>>>>>>Table[Random[Integer, {1, 5}], {20}]>>>>> runscount[%]>>>>>>>>>> {2,2, 3, 1, 3, 2, 2, 3, 1, 1, 2, 3, 1, 1, 3, 1, 1, 2,!> 2!>> , 2}>>>>>>>>>> {{1, 4}, {2, 4}, {3, 5}}>>>>>>>>>>>>>>> ----->>>>>>> Selwyn Hollis>>>>> http://www.appliedsymbols.com>>>>> (edit reply-to toreply)>>>>>>>>>>>>>>> On Nov 4, 2004, at 1:50 AM, Gregory Lypnywrote:>>>>>>>>>>> Looking for an elegant way to count runs to numbers in aseries.>>>>>> Suppose I have a list of ones and negative ones such as>! >>! ! > >>>v={1,1,1,-1,1,1,1,1,1,-1,-1,-1,-1,1}.>>>>>> I'd like to create a functionthat counts the number of runs of 1s>>>>>> and>>>>>> -1s, which in this caseis 3 and 2.>>>>>>>>>>>>Greg>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -->>>DrBob at bigfoot.com>>>www.eclecticdreams.net>>>>>>>>>> ------------------------------------------------------------------->> János Löbb>> Yale University School of Medicine>>Department of Pathology>> Phone: 203-737-5204>> Fax: 203-785-7303>>E-mail: janos.lobb at yale.edu>>>>>>>>>>>> --> DrBob at bigfoot.com>www.eclecticdreams.net>>>>>>>>>>>>>>>>> --> DrBob at bigfoot.com> www.eclecticdreams.net> > > > > -- DrBob at bigfoot.com www.eclecticdreams.net

**Follow-Ups**:**Re: Re: Counting Runs***From:*Andrzej Kozlowski <akoz@mimuw.edu.pl>

**Re: matlab code to mma?**

**execution problem**

**Re: Counting Runs**

**Re: Re: Counting Runs**