perhaps? RE: Finding pattern Matched series

• To: mathgroup at smc.vnet.net
• Subject: [mg33065] perhaps? RE: [mg33051] Finding pattern Matched series
• From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
• Date: Fri, 1 Mar 2002 06:51:06 -0500 (EST)
• Sender: owner-wri-mathgroup at wolfram.com

```> -----Original Message-----
> From:	Jari Curty [SMTP:jabidof at yahoo.fr]
To: mathgroup at smc.vnet.net
> Sent:	Wednesday, February 27, 2002 6:48 AM
> To:	mathgroup at smc.vnet.net
> Subject:	[mg33051] Finding pattern Matched series
>
> Dear Mathgroup,
>
> Here is the problem:
>
> Given the following list T = {{1, 2}, {3, 4}, {2, 3}, {5, 6}, {4, 5}, {6,
> 4}, {4, 1}} extract ALL the series of the following form:
>
> {{#, x_}, {x_, #}}
> {{#, x_}, {x_, y_}, {y_, #}}
> {{#, x_}, {x_, y_}, {y_, z_}, {z_, #}}
> etc...
>
> for the for the value # in Range[Max[T]].
>
> I guess it is a subtle mixture between Patterns and Maps but I can see it
> :-(
>
> Thanks for your appreciated help!
>
> Jari
>
[Hartmut Wolf]

Jari,

it takes me some guesswork to reach at your problem. And perhaps, I did not
reach it at all. However I found a nice exercise in Mathematica programming.

t = {{1, 2}, {3, 4}, {2, 3}, {5, 6}, {4, 5}, {6, 4}, {4, 1}};

(Now it would have been nice, had you given any solution examples.)

First guess: taken your pattern, and let # denote any instance of a number:

ReplaceList[t, {___, {a_, x_}, {x_, b_}, ___} :> {{a, x}, {x, b}}]

{{{6, 4}, {4, 1}}}

You get one match for your example. However not for the longer subsequences:

MatchQ[t, {___, {a_, x_}, {x_, y_}, {y_, b_}, ___}]
False

MatchQ[t, {___, {a_, x_}, {x_, y_}, {y_, z_}, {z_, b_}, ___}]
False

Second guess: taken your pattern, and let # denote the same number for both
instances, however take different instances for (your x_, y_, z_ ...) Then

ReplaceList[t, {___, {x_, a_}, {b_, x_}, ___} :> {{x, a}, {b, x}}]

{{{3, 4}, {2, 3}}, {{5, 6}, {4, 5}}, {{4, 5}, {6, 4}}}

% // Length

3

Gives three occurrances, yet still none in the other cases:

ReplaceList[t, {___, {x_, a_}, bb_, {c_, x_}, ___} :> {{x, a}, bb, {c, x}}]

{}

ReplaceList[t, {___, {x_, a_},
bb__ /; Length[{bb}] == 2, {c_, x_}, ___} :> {{x, a}, bb, {c, x}}]
{}

Now let's take this as the problem, but reach for more interesting data:

SeedRandom[1]
tt = Table[{Random[Integer, {1, 6}], Random[Integer, {1, 6}]}, {1000}];

Define a generator for the pattern, with definable "gap" length n :

subsequenceRule[n_Integer?NonNegative] :=
{___, {x_, a_}, bb___ /; Length[{bb}] === n, {c_, x_}, ___}
-> {{x, a}, bb, {c, x}}

ReplaceList[t, subsequenceRule[0]]

{{{3, 4}, {2, 3}}, {{5, 6}, {4, 5}}, {{4, 5}, {6, 4}}}

This is the same result on your data.

We compare timings for different lengths of your sequence of pairs:

n = 100:

ReplaceList[Take[tt, 100], subsequenceRule[0]] // Short // Timing

{0.902 Second,
{{{3, 3}, {4, 3}}, << 15 >>, {{4, 6}, {5, 4}}}}

n = 200:

ReplaceList[Take[tt, 200], subsequenceRule[0]] // Short // Timing

{5.207 Second,
{{{3, 3}, {4, 3}}, << 29 >>, {{6, 3}, {4, 6}}}}

(2/0.902)^2.4

6.76046

n = 300:

ReplaceList[Take[tt, 300], subsequenceRule[0]] // Short // Timing

{15.753 Second,
{{{3, 3}, {4, 3}}, << 46 >>, {{1, 4}, {1, 1}}}}

(3/0.902)^2.4

17.8894

n = 400:

(s0 = ReplaceList[Take[tt, 400], subsequenceRule[0]]) // Short // Timing

{35.301 Second,
{{{3, 3}, {4, 3}}, << 61 >>, {{4, 4}, {6, 4}}}}

(4/0.902)^2.4

35.6819

We see a time complexity of worse than O[n^2], so this will not be feasable
for longer sequences (say for looking at DNA).

The Timings do not depend much on the gap length:

(s1 = ReplaceList[Take[tt, 400], subsequenceRule[1]]) // Short // Timing

{36.753 Second,
{{{6, 1}, {1, 2}, {5, 6}}, << 46 >>, {{3, 6}, {6, 2}, {3, 3}}}}

s2 = ReplaceList[Take[tt, 400], subsequenceRule[2]]; // Timing

{36.823 Second, Null}

Short[s2, 3]

{{{6, 1}, {1, 2}, {5, 6}, {4, 6}},
{{1, 2}, {5, 6}, {4, 6}, {1, 1}}, << 47 >>,
{{2, 2}, {1, 4}, {4, 1}, {6, 2}},
{{1, 5}, {3, 5}, {5, 6}, {3, 1}}}

s3 = ReplaceList[Take[tt, 400], subsequenceRule[3]]; // Timing

{37.884 Second, Null}

Short[s3, 3]

{{{6, 3}, {3, 3}, {6, 1}, {1, 2}, {5, 6}},
{{4, 6}, {6, 2}, {3, 4}, {3, 5}, {5, 4}}, << 67 >>,
{{2, 6}, {4, 2}, {1, 5}, {3, 6}, {6, 2}}}

We had found a method to define patterns with variable length of gap.
However testing for the length of the "gap" is inefficient _after_ the
sequence pattern has matched the gap (there will be too many futile tries).

Better define a "static" pattern for this purpose:

subsequenceRuleStatic[n_Integer?NonNegative] :=
With[{gapsymbols = Table[Unique[gap], {2n + 2}]},
(Join[{___}, Map[Pattern[#, Blank[]] &, #, {2}], {___}] -> # &)[
Partition[Join[{x}, gapsymbols, {x}], 2]]
]

subsequenceRuleStatic[0]

{___, {x_, gap\$9_}, {gap\$10_, x_}, ___} -> {{x, gap\$9}, {gap\$10, x}}

subsequenceRuleStatic[3]

{___, {x_, gap\$11_}, {gap\$12_, gap\$13_}, {gap\$14_, gap\$15_}, {gap\$16_,
gap\$17_}, {gap\$18_, x_}, ___} -> {{x, gap\$11}, {gap\$12,
gap\$13}, {gap\$14, gap\$15}, {gap\$16, gap\$17}, {gap\$18, x}}

Unique gives us new symbols for the pattern variables introduced.

We get the same matches as before.

ss3 = ReplaceList[Take[tt, 400], subsequenceRuleStatic[3]]; // Timing

{0.1 Second, Null}

s3 == ss3

True

We try longer sequences now:

SeedRandom[1]

ttt = Table[{Random[Integer, {1, 6}], Random[Integer, {1, 6}]}, {10000}];

Timing[ReplaceList[Take[ttt, 1000*#], subsequenceRuleStatic[3]]][[1]] & /@
Range[4]

{0.531 Second, 2.253 Second, 5.398 Second, 9.043 Second}

Better by more than an order of magnitude, but still bad (quadratic) time
complexity.

Here now a much better solution:

subsequenceRulesStaticX[n_Integer?NonNegative] :=
With[{gapsymbols = Table[Unique[gap], {2n + 2}]},
{(Map[Pattern[#, Blank[]] &, #, {2}] -> # &)[
Partition[Join[{x}, gapsymbols, {x}], 2]], _ :> Sequence[]}
]

subsequenceRulesStaticX[1]

{{{x_, gap\$61_}, {gap\$62_, gap\$63_}, {gap\$64_, x_}} -> {{x, gap\$61},
{gap\$62,
gap\$63}, {gap\$64, x}}, _ :> Sequence[]}

The idea is not to travel along the primary sequence with gap matching
BlankNullSequence[] but cut the data into appropriate (short) pieces and
test each piece.

sss3 = Replace[Partition[Take[tt, 400], 3 + 2, 1],
subsequenceRulesStaticX[3], {1}]; // Timing

{0.04 Second, Null}

ss3 == sss3

True

A little bit better for this problem size, but observe...

Timing[Replace[Partition[Take[ttt, 1000*#], 3 + 2, 1],
subsequenceRulesStaticX[3], {1}]][[1]] & /@ Range[10]

{0.08 Second, 0.18 Second, 0.27 Second, 0.351 Second, 0.44 Second,
0.541 Second, 0.621 Second, 0.711 Second, 0.791 Second, 0.882 Second}

... linear time complexity now!

You can now check a sequence of a million pairs:

t4 = Table[{Random[Integer, {1, 6}], Random[Integer, {1, 6}]}, {1000000}];

Timing[Length[
Replace[Partition[t4, 3 + 2, 1], subsequenceRulesStaticX[3], {1}]
]]

{112.111 Second, 165743}

If you have nothing to do with the gap values, then it suffices to match
blanks:

Timing[Length[
Replace[Partition[t4,3+2,1],
{s:{{x_,_},_,_,_,{_,x_}} :> s,_ :> Sequence[]},{1}]
]]

{89.469 Second,165743}

A seemingly simple solution:

m4 = Drop[RotateLeft[t4, #], -4] & /@ Range[0, 4];

res = chainedlist[];

If[First[#1] === Last[#5],
res = chainedlist[res, {#1, #2, #3, #4, #5}]] &, mm];

Length[r2 = List @@ Flatten[res]]

took all of my machine (real memory, disk space; and, on smaller problems,
was slower by at least a factor of 2)

If your problem size exceeds a million, then write a little C-program which
reads and writes a sequential file. It should run at the speed of your basic
I/O.

However, if I have not guessed right for your problem, you perhaps might
have learnt a trick or another.

--
Hartmut Wolf

```

• Prev by Date: Re: Mathematica have an 'Assume' type syntax/command?
• Next by Date: Re: Intersection[...,SameTest] ?
• Previous by thread: RE: perhaps? RE: Finding pattern Matched series
• Next by thread: Re: Solve (on Mathematica 4.1 Windows XP)?