perhaps? RE: Finding pattern Matched series
- To: mathgroup at smc.vnet.net
- Subject: [mg33065] perhaps? RE: [mg33051] Finding pattern Matched series
- From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
- Date: Fri, 1 Mar 2002 06:51:06 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
> -----Original Message-----
> From: Jari Curty [SMTP:jabidof at yahoo.fr]
To: mathgroup at smc.vnet.net
> Sent: Wednesday, February 27, 2002 6:48 AM
> To: mathgroup at smc.vnet.net
> Subject: [mg33051] Finding pattern Matched series
>
> Dear Mathgroup,
>
> Here is the problem:
>
> Given the following list T = {{1, 2}, {3, 4}, {2, 3}, {5, 6}, {4, 5}, {6,
> 4}, {4, 1}} extract ALL the series of the following form:
>
> {{#, x_}, {x_, #}}
> {{#, x_}, {x_, y_}, {y_, #}}
> {{#, x_}, {x_, y_}, {y_, z_}, {z_, #}}
> etc...
>
> for the for the value # in Range[Max[T]].
>
> I guess it is a subtle mixture between Patterns and Maps but I can see it
> :-(
>
> Thanks for your appreciated help!
>
> Jari
>
[Hartmut Wolf]
Jari,
it takes me some guesswork to reach at your problem. And perhaps, I did not
reach it at all. However I found a nice exercise in Mathematica programming.
Your example data
t = {{1, 2}, {3, 4}, {2, 3}, {5, 6}, {4, 5}, {6, 4}, {4, 1}};
(Now it would have been nice, had you given any solution examples.)
First guess: taken your pattern, and let # denote any instance of a number:
ReplaceList[t, {___, {a_, x_}, {x_, b_}, ___} :> {{a, x}, {x, b}}]
{{{6, 4}, {4, 1}}}
You get one match for your example. However not for the longer subsequences:
MatchQ[t, {___, {a_, x_}, {x_, y_}, {y_, b_}, ___}]
False
MatchQ[t, {___, {a_, x_}, {x_, y_}, {y_, z_}, {z_, b_}, ___}]
False
Second guess: taken your pattern, and let # denote the same number for both
instances, however take different instances for (your x_, y_, z_ ...) Then
ReplaceList[t, {___, {x_, a_}, {b_, x_}, ___} :> {{x, a}, {b, x}}]
{{{3, 4}, {2, 3}}, {{5, 6}, {4, 5}}, {{4, 5}, {6, 4}}}
% // Length
3
Gives three occurrances, yet still none in the other cases:
ReplaceList[t, {___, {x_, a_}, bb_, {c_, x_}, ___} :> {{x, a}, bb, {c, x}}]
{}
ReplaceList[t, {___, {x_, a_},
bb__ /; Length[{bb}] == 2, {c_, x_}, ___} :> {{x, a}, bb, {c, x}}]
{}
Now let's take this as the problem, but reach for more interesting data:
SeedRandom[1]
tt = Table[{Random[Integer, {1, 6}], Random[Integer, {1, 6}]}, {1000}];
Define a generator for the pattern, with definable "gap" length n :
subsequenceRule[n_Integer?NonNegative] :=
{___, {x_, a_}, bb___ /; Length[{bb}] === n, {c_, x_}, ___}
-> {{x, a}, bb, {c, x}}
ReplaceList[t, subsequenceRule[0]]
{{{3, 4}, {2, 3}}, {{5, 6}, {4, 5}}, {{4, 5}, {6, 4}}}
This is the same result on your data.
We compare timings for different lengths of your sequence of pairs:
n = 100:
ReplaceList[Take[tt, 100], subsequenceRule[0]] // Short // Timing
{0.902 Second,
{{{3, 3}, {4, 3}}, << 15 >>, {{4, 6}, {5, 4}}}}
n = 200:
ReplaceList[Take[tt, 200], subsequenceRule[0]] // Short // Timing
{5.207 Second,
{{{3, 3}, {4, 3}}, << 29 >>, {{6, 3}, {4, 6}}}}
(2/0.902)^2.4
6.76046
n = 300:
ReplaceList[Take[tt, 300], subsequenceRule[0]] // Short // Timing
{15.753 Second,
{{{3, 3}, {4, 3}}, << 46 >>, {{1, 4}, {1, 1}}}}
(3/0.902)^2.4
17.8894
n = 400:
(s0 = ReplaceList[Take[tt, 400], subsequenceRule[0]]) // Short // Timing
{35.301 Second,
{{{3, 3}, {4, 3}}, << 61 >>, {{4, 4}, {6, 4}}}}
(4/0.902)^2.4
35.6819
We see a time complexity of worse than O[n^2], so this will not be feasable
for longer sequences (say for looking at DNA).
The Timings do not depend much on the gap length:
(s1 = ReplaceList[Take[tt, 400], subsequenceRule[1]]) // Short // Timing
{36.753 Second,
{{{6, 1}, {1, 2}, {5, 6}}, << 46 >>, {{3, 6}, {6, 2}, {3, 3}}}}
s2 = ReplaceList[Take[tt, 400], subsequenceRule[2]]; // Timing
{36.823 Second, Null}
Short[s2, 3]
{{{6, 1}, {1, 2}, {5, 6}, {4, 6}},
{{1, 2}, {5, 6}, {4, 6}, {1, 1}}, << 47 >>,
{{2, 2}, {1, 4}, {4, 1}, {6, 2}},
{{1, 5}, {3, 5}, {5, 6}, {3, 1}}}
s3 = ReplaceList[Take[tt, 400], subsequenceRule[3]]; // Timing
{37.884 Second, Null}
Short[s3, 3]
{{{6, 3}, {3, 3}, {6, 1}, {1, 2}, {5, 6}},
{{4, 6}, {6, 2}, {3, 4}, {3, 5}, {5, 4}}, << 67 >>,
{{2, 6}, {4, 2}, {1, 5}, {3, 6}, {6, 2}}}
We had found a method to define patterns with variable length of gap.
However testing for the length of the "gap" is inefficient _after_ the
sequence pattern has matched the gap (there will be too many futile tries).
Better define a "static" pattern for this purpose:
subsequenceRuleStatic[n_Integer?NonNegative] :=
With[{gapsymbols = Table[Unique[gap], {2n + 2}]},
(Join[{___}, Map[Pattern[#, Blank[]] &, #, {2}], {___}] -> # &)[
Partition[Join[{x}, gapsymbols, {x}], 2]]
]
subsequenceRuleStatic[0]
{___, {x_, gap$9_}, {gap$10_, x_}, ___} -> {{x, gap$9}, {gap$10, x}}
subsequenceRuleStatic[3]
{___, {x_, gap$11_}, {gap$12_, gap$13_}, {gap$14_, gap$15_}, {gap$16_,
gap$17_}, {gap$18_, x_}, ___} -> {{x, gap$11}, {gap$12,
gap$13}, {gap$14, gap$15}, {gap$16, gap$17}, {gap$18, x}}
Unique gives us new symbols for the pattern variables introduced.
We get the same matches as before.
ss3 = ReplaceList[Take[tt, 400], subsequenceRuleStatic[3]]; // Timing
{0.1 Second, Null}
s3 == ss3
True
We try longer sequences now:
SeedRandom[1]
ttt = Table[{Random[Integer, {1, 6}], Random[Integer, {1, 6}]}, {10000}];
Timing[ReplaceList[Take[ttt, 1000*#], subsequenceRuleStatic[3]]][[1]] & /@
Range[4]
{0.531 Second, 2.253 Second, 5.398 Second, 9.043 Second}
Better by more than an order of magnitude, but still bad (quadratic) time
complexity.
Here now a much better solution:
subsequenceRulesStaticX[n_Integer?NonNegative] :=
With[{gapsymbols = Table[Unique[gap], {2n + 2}]},
{(Map[Pattern[#, Blank[]] &, #, {2}] -> # &)[
Partition[Join[{x}, gapsymbols, {x}], 2]], _ :> Sequence[]}
]
subsequenceRulesStaticX[1]
{{{x_, gap$61_}, {gap$62_, gap$63_}, {gap$64_, x_}} -> {{x, gap$61},
{gap$62,
gap$63}, {gap$64, x}}, _ :> Sequence[]}
The idea is not to travel along the primary sequence with gap matching
BlankNullSequence[] but cut the data into appropriate (short) pieces and
test each piece.
sss3 = Replace[Partition[Take[tt, 400], 3 + 2, 1],
subsequenceRulesStaticX[3], {1}]; // Timing
{0.04 Second, Null}
ss3 == sss3
True
A little bit better for this problem size, but observe...
Timing[Replace[Partition[Take[ttt, 1000*#], 3 + 2, 1],
subsequenceRulesStaticX[3], {1}]][[1]] & /@ Range[10]
{0.08 Second, 0.18 Second, 0.27 Second, 0.351 Second, 0.44 Second,
0.541 Second, 0.621 Second, 0.711 Second, 0.791 Second, 0.882 Second}
... linear time complexity now!
You can now check a sequence of a million pairs:
t4 = Table[{Random[Integer, {1, 6}], Random[Integer, {1, 6}]}, {1000000}];
Timing[Length[
Replace[Partition[t4, 3 + 2, 1], subsequenceRulesStaticX[3], {1}]
]]
{112.111 Second, 165743}
If you have nothing to do with the gap values, then it suffices to match
blanks:
Timing[Length[
Replace[Partition[t4,3+2,1],
{s:{{x_,_},_,_,_,{_,x_}} :> s,_ :> Sequence[]},{1}]
]]
{89.469 Second,165743}
A seemingly simple solution:
m4 = Drop[RotateLeft[t4, #], -4] & /@ Range[0, 4];
res = chainedlist[];
MapThread[
If[First[#1] === Last[#5],
res = chainedlist[res, {#1, #2, #3, #4, #5}]] &, mm];
Length[r2 = List @@ Flatten[res]]
took all of my machine (real memory, disk space; and, on smaller problems,
was slower by at least a factor of 2)
If your problem size exceeds a million, then write a little C-program which
reads and writes a sequential file. It should run at the speed of your basic
I/O.
However, if I have not guessed right for your problem, you perhaps might
have learnt a trick or another.
--
Hartmut Wolf