perhaps? RE: Finding pattern Matched series
- To: mathgroup at smc.vnet.net
- Subject: [mg33065] perhaps? RE: [mg33051] Finding pattern Matched series
- From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
- Date: Fri, 1 Mar 2002 06:51:06 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
> -----Original Message----- > From: Jari Curty [SMTP:jabidof at yahoo.fr] To: mathgroup at smc.vnet.net > Sent: Wednesday, February 27, 2002 6:48 AM > To: mathgroup at smc.vnet.net > Subject: [mg33051] Finding pattern Matched series > > Dear Mathgroup, > > Here is the problem: > > Given the following list T = {{1, 2}, {3, 4}, {2, 3}, {5, 6}, {4, 5}, {6, > 4}, {4, 1}} extract ALL the series of the following form: > > {{#, x_}, {x_, #}} > {{#, x_}, {x_, y_}, {y_, #}} > {{#, x_}, {x_, y_}, {y_, z_}, {z_, #}} > etc... > > for the for the value # in Range[Max[T]]. > > I guess it is a subtle mixture between Patterns and Maps but I can see it > :-( > > Thanks for your appreciated help! > > Jari > [Hartmut Wolf] Jari, it takes me some guesswork to reach at your problem. And perhaps, I did not reach it at all. However I found a nice exercise in Mathematica programming. Your example data t = {{1, 2}, {3, 4}, {2, 3}, {5, 6}, {4, 5}, {6, 4}, {4, 1}}; (Now it would have been nice, had you given any solution examples.) First guess: taken your pattern, and let # denote any instance of a number: ReplaceList[t, {___, {a_, x_}, {x_, b_}, ___} :> {{a, x}, {x, b}}] {{{6, 4}, {4, 1}}} You get one match for your example. However not for the longer subsequences: MatchQ[t, {___, {a_, x_}, {x_, y_}, {y_, b_}, ___}] False MatchQ[t, {___, {a_, x_}, {x_, y_}, {y_, z_}, {z_, b_}, ___}] False Second guess: taken your pattern, and let # denote the same number for both instances, however take different instances for (your x_, y_, z_ ...) Then ReplaceList[t, {___, {x_, a_}, {b_, x_}, ___} :> {{x, a}, {b, x}}] {{{3, 4}, {2, 3}}, {{5, 6}, {4, 5}}, {{4, 5}, {6, 4}}} % // Length 3 Gives three occurrances, yet still none in the other cases: ReplaceList[t, {___, {x_, a_}, bb_, {c_, x_}, ___} :> {{x, a}, bb, {c, x}}] {} ReplaceList[t, {___, {x_, a_}, bb__ /; Length[{bb}] == 2, {c_, x_}, ___} :> {{x, a}, bb, {c, x}}] {} Now let's take this as the problem, but reach for more interesting data: SeedRandom[1] tt = Table[{Random[Integer, {1, 6}], Random[Integer, {1, 6}]}, {1000}]; Define a generator for the pattern, with definable "gap" length n : subsequenceRule[n_Integer?NonNegative] := {___, {x_, a_}, bb___ /; Length[{bb}] === n, {c_, x_}, ___} -> {{x, a}, bb, {c, x}} ReplaceList[t, subsequenceRule[0]] {{{3, 4}, {2, 3}}, {{5, 6}, {4, 5}}, {{4, 5}, {6, 4}}} This is the same result on your data. We compare timings for different lengths of your sequence of pairs: n = 100: ReplaceList[Take[tt, 100], subsequenceRule[0]] // Short // Timing {0.902 Second, {{{3, 3}, {4, 3}}, << 15 >>, {{4, 6}, {5, 4}}}} n = 200: ReplaceList[Take[tt, 200], subsequenceRule[0]] // Short // Timing {5.207 Second, {{{3, 3}, {4, 3}}, << 29 >>, {{6, 3}, {4, 6}}}} (2/0.902)^2.4 6.76046 n = 300: ReplaceList[Take[tt, 300], subsequenceRule[0]] // Short // Timing {15.753 Second, {{{3, 3}, {4, 3}}, << 46 >>, {{1, 4}, {1, 1}}}} (3/0.902)^2.4 17.8894 n = 400: (s0 = ReplaceList[Take[tt, 400], subsequenceRule[0]]) // Short // Timing {35.301 Second, {{{3, 3}, {4, 3}}, << 61 >>, {{4, 4}, {6, 4}}}} (4/0.902)^2.4 35.6819 We see a time complexity of worse than O[n^2], so this will not be feasable for longer sequences (say for looking at DNA). The Timings do not depend much on the gap length: (s1 = ReplaceList[Take[tt, 400], subsequenceRule[1]]) // Short // Timing {36.753 Second, {{{6, 1}, {1, 2}, {5, 6}}, << 46 >>, {{3, 6}, {6, 2}, {3, 3}}}} s2 = ReplaceList[Take[tt, 400], subsequenceRule[2]]; // Timing {36.823 Second, Null} Short[s2, 3] {{{6, 1}, {1, 2}, {5, 6}, {4, 6}}, {{1, 2}, {5, 6}, {4, 6}, {1, 1}}, << 47 >>, {{2, 2}, {1, 4}, {4, 1}, {6, 2}}, {{1, 5}, {3, 5}, {5, 6}, {3, 1}}} s3 = ReplaceList[Take[tt, 400], subsequenceRule[3]]; // Timing {37.884 Second, Null} Short[s3, 3] {{{6, 3}, {3, 3}, {6, 1}, {1, 2}, {5, 6}}, {{4, 6}, {6, 2}, {3, 4}, {3, 5}, {5, 4}}, << 67 >>, {{2, 6}, {4, 2}, {1, 5}, {3, 6}, {6, 2}}} We had found a method to define patterns with variable length of gap. However testing for the length of the "gap" is inefficient _after_ the sequence pattern has matched the gap (there will be too many futile tries). Better define a "static" pattern for this purpose: subsequenceRuleStatic[n_Integer?NonNegative] := With[{gapsymbols = Table[Unique[gap], {2n + 2}]}, (Join[{___}, Map[Pattern[#, Blank[]] &, #, {2}], {___}] -> # &)[ Partition[Join[{x}, gapsymbols, {x}], 2]] ] subsequenceRuleStatic[0] {___, {x_, gap$9_}, {gap$10_, x_}, ___} -> {{x, gap$9}, {gap$10, x}} subsequenceRuleStatic[3] {___, {x_, gap$11_}, {gap$12_, gap$13_}, {gap$14_, gap$15_}, {gap$16_, gap$17_}, {gap$18_, x_}, ___} -> {{x, gap$11}, {gap$12, gap$13}, {gap$14, gap$15}, {gap$16, gap$17}, {gap$18, x}} Unique gives us new symbols for the pattern variables introduced. We get the same matches as before. ss3 = ReplaceList[Take[tt, 400], subsequenceRuleStatic[3]]; // Timing {0.1 Second, Null} s3 == ss3 True We try longer sequences now: SeedRandom[1] ttt = Table[{Random[Integer, {1, 6}], Random[Integer, {1, 6}]}, {10000}]; Timing[ReplaceList[Take[ttt, 1000*#], subsequenceRuleStatic[3]]][[1]] & /@ Range[4] {0.531 Second, 2.253 Second, 5.398 Second, 9.043 Second} Better by more than an order of magnitude, but still bad (quadratic) time complexity. Here now a much better solution: subsequenceRulesStaticX[n_Integer?NonNegative] := With[{gapsymbols = Table[Unique[gap], {2n + 2}]}, {(Map[Pattern[#, Blank[]] &, #, {2}] -> # &)[ Partition[Join[{x}, gapsymbols, {x}], 2]], _ :> Sequence[]} ] subsequenceRulesStaticX[1] {{{x_, gap$61_}, {gap$62_, gap$63_}, {gap$64_, x_}} -> {{x, gap$61}, {gap$62, gap$63}, {gap$64, x}}, _ :> Sequence[]} The idea is not to travel along the primary sequence with gap matching BlankNullSequence[] but cut the data into appropriate (short) pieces and test each piece. sss3 = Replace[Partition[Take[tt, 400], 3 + 2, 1], subsequenceRulesStaticX[3], {1}]; // Timing {0.04 Second, Null} ss3 == sss3 True A little bit better for this problem size, but observe... Timing[Replace[Partition[Take[ttt, 1000*#], 3 + 2, 1], subsequenceRulesStaticX[3], {1}]][[1]] & /@ Range[10] {0.08 Second, 0.18 Second, 0.27 Second, 0.351 Second, 0.44 Second, 0.541 Second, 0.621 Second, 0.711 Second, 0.791 Second, 0.882 Second} ... linear time complexity now! You can now check a sequence of a million pairs: t4 = Table[{Random[Integer, {1, 6}], Random[Integer, {1, 6}]}, {1000000}]; Timing[Length[ Replace[Partition[t4, 3 + 2, 1], subsequenceRulesStaticX[3], {1}] ]] {112.111 Second, 165743} If you have nothing to do with the gap values, then it suffices to match blanks: Timing[Length[ Replace[Partition[t4,3+2,1], {s:{{x_,_},_,_,_,{_,x_}} :> s,_ :> Sequence[]},{1}] ]] {89.469 Second,165743} A seemingly simple solution: m4 = Drop[RotateLeft[t4, #], -4] & /@ Range[0, 4]; res = chainedlist[]; MapThread[ If[First[#1] === Last[#5], res = chainedlist[res, {#1, #2, #3, #4, #5}]] &, mm]; Length[r2 = List @@ Flatten[res]] took all of my machine (real memory, disk space; and, on smaller problems, was slower by at least a factor of 2) If your problem size exceeds a million, then write a little C-program which reads and writes a sequential file. It should run at the speed of your basic I/O. However, if I have not guessed right for your problem, you perhaps might have learnt a trick or another. -- Hartmut Wolf