RE: Re: Re: Pattern Matching in Lists

*To*: mathgroup at smc.vnet.net*Subject*: [mg35675] RE: [mg35633] Re: [mg35586] Re: Pattern Matching in Lists*From*: "DrBob" <majort at cox-internet.com>*Date*: Thu, 25 Jul 2002 04:46:13 -0400 (EDT)*Reply-to*: <drbob at bigfoot.com>*Sender*: owner-wri-mathgroup at wolfram.com

Fred's two solutions (both ingenious!) are simons1 and simons2 in the code below. Allan Hayes' f5 is about ten times as fast as simons1, but simons2 beat f5 with a p-value of 5.4% over 100 lists (for the lists of 2 million that I'm using). On another trial with 200 lists, simons2 beat f5 by 8.4% with a p-value of .00009. The % difference in Timing had the following 95% confidence interval: << "Statistics`ConfidenceIntervals`" MeanCI[(t1 - t2)/Mean[t1]] {0.040925889018143896, 0.12796682963766004} Hence simons2 is the winner by a margin between 4% and 13%. Of course, it's possible that lists of different lengths or different distributions of 0's and 1's would give different results. I also replaced If with Which to get simons3. This made no real difference, as I should have expected! (For 200 trials, I got a difference of 0.8% and a p-value of 38.3%.) Bobby Needs["Statistics`HypothesisTests`"] f1[w_] := Count[Partition[w, 2, 1], {1, 0}] f2[w_] := Count[Drop[w, -1] - Drop[w, 1], 1] (* Allan Hayes *) f3[w_] := Count[Drop[w + 2RotateRight[w], 1], 2] f4[w_] := Tr[Drop[w, -1](Drop[w, -1] - Drop[w, 1])] f5[w_] := (Tr[w] - Tr[BitAnd[w, RotateLeft[w]]]) + If[w[[1]] == 0 && w[[-1]] == 1, -1, 0] (* Carl K. Woll, Allan Hayes *) f6[w_] := (Tr[#] - Tr[# Drop[w, 1]]) &[Drop[w, -1]] (* Allan Hayes *) f7 = Compile[{{w, _Integer, 1}}, Count[Drop[w, -1] - Drop[w, 1], 1]] (* Andrzej Kozlowski *); f8[w_List] /; Last[w] == 1 && First[w] == 0 := Count[w - RotateLeft[w], 1]; f8[w_List] := Count[Drop[w, -1] - Drop[w, 1], 1] (* Andrzej *); f9[w_] := Count[w - RotateLeft[w], 1] (* Andrzej *) simons1[w_] := Count[ListCorrelate[{1, -1}, w], 1] simons2[lst_] := (Tr[BitXor[lst, RotateLeft[lst]]] + If[lst[[1]] == 1, 0, If[lst[[-1]] == 1, -2, 0]])/2 simons3[w_] := (Tr[BitXor[w, RotateLeft[w]]] + Which[w[[1]] == 1, 0, w[[-1]] == 1, -2, True, 0])/2 trial[f_, g_] := (n = 2000000; w = Table[Random[Integer], {n}]; First@Timing[#[w];]/Second & /@ {f, g} ) test[f_, g_, n_Integer] := ( {t1, t2} = Transpose[trial[f, g] & /@ Range[n]]; Print[Mean /@ {t1, t2}]; Print[r = MeanTest[t1 - t2, 0, FullReport -> True]]; meanDiff = (FullReport /. r)[[1, 1, 1]]; Print["% difference = ", meanDiff/Mean[t1]]) test[f5, simons1, 40] {0.07045000000000243, 0.7038500000000006} {FullReport -> TableForm[{{"Mean", "TestStat", "Distribution"}, {-0.6333999999999982, -122.74356937717243, StudentTDistribution[39]}}, TableHeadings -> {None, {"Mean", "TestStat", "Distribution"}}], OneSidedPValue -> 2.1679176203297467*^-52} % difference = -8.990773598 test[f5, simons2, 100] {0.06879000000000246, 0.0654299999999978} {FullReport -> TableForm[{{"Mean", "TestStat", "Distribution"}, {0.003360000000004675, 1.6229487334543304, StudentTDistribution[99]}}, TableHeadings -> {None, {"Mean", "TestStat", "Distribution"}}], OneSidedPValue -> 0.053890293563366165} % difference = 0.04884430876587519 Here's timing for one list, for all 12 solutions: Timing[#[w]]/{Second, 1} & /@ {f1, f2, f3, f4, f5, f6, f7, f8, f9, simons1, simons2, simons3} {{1.266, 500515}, {0.547, 500515}, {0.5, 500515}, {0.171, 500515}, {0.063, 500515}, {0.141, 500515}, {0.14, 500515}, {0.516, 500516}, {0.515, 500516}, {0.672, 500515}, {0.047, 500515}, {0.078, 500515}} Bobby Treat -----Original Message----- From: Fred Simons [mailto:f.h.simons at tue.nl] To: mathgroup at smc.vnet.net Subject: [mg35675] [mg35633] Re: [mg35586] Re: Pattern Matching in Lists Many postings have appeared how to find the number of occurences (1,0) in a sequence consisting of zeros and ones. My interest in these postings is mainly in the various techniques used to solve the problem rather than the speed of the solution, though I immediately agree that a short and elegant solution usually is fast as well. No doubt the shortest and most elegant solution is found by splitting the list into pairs and simply count the number of lists {1,0}. But that turns out not to be the fastest solution. The idea behind all other solutions is to look at the sequence of successive differences and then count the number 1. So how to find this list of differences? Two ways have been demonstrated: Drop[lst, -1] - Drop[lst, 1] Drop[lst - RotateLeft[lst], 1] These two solutions are about as fast. Andrzej Kozlowsky noted that dropping the last element in the second solution is only necessary when the list has the structure {0, ..., 1} which for long lists may give some gain of speed. There is a third way, as fast as the two others: ListCorrelate[{1,-1}, lst ] The list of differences consists of the elements 1, 0 and -1. Instead of using Count for counting the number of ones, Carl Woll in an ingenious way replaced the elements -1 by 0 and then used the trace function to count the number of ones. Successive improvements in the implementation of this idea bij Allan Hayes and Carl Woll lead to an amazing fast solution by Allan Hayes. Still another approach is possible, based on the observation that between any two occurences of {1,0} in the list there must be an occurence of {0,1}. Therefore when the list starts with 1 and ends with 0, the number of 1's in the list of diffences is one more than the number of -1's, etc. Using the same techniques as developed by Carl Woll and Allan Hayes we arrive at the following solution: (Tr[BitXor[lst, RotateLeft[lst]]] + If[lst[[1]]==1, 0,If[lst[[-1]]==1, -2,0]]) / 2 Here is a timing for a list of length 5 10^6, compared with Alan's solution: In[126]:= lst = Table[Random[Integer], {5000000}]; (Tr[lst]-Tr[BitAnd[lst,RotateLeft[lst]]])+ If[lst[[1]]\[Equal]0&&lst[[-1]]\[Equal]1,-1,0]//Timing (Tr[BitXor[lst, RotateLeft[lst]]]+ If[lst[[1]]\[Equal]1, 0,If[lst[[-1]]\[Equal]1, -2,0]]) / 2 // Timing Out[127]= {1.42 Second,1249704} Out[128]= {0.88 Second,1249704} Fred Simons Eindhoven University of Technology