MathGroup Archive 2002

[Date Index] [Thread Index] [Author Index]

Search the Archive

perhaps? RE: Finding pattern Matched series

  • To: mathgroup at smc.vnet.net
  • Subject: [mg33065] perhaps? RE: [mg33051] Finding pattern Matched series
  • From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
  • Date: Fri, 1 Mar 2002 06:51:06 -0500 (EST)
  • Sender: owner-wri-mathgroup at wolfram.com

> -----Original Message-----
> From:	Jari Curty [SMTP:jabidof at yahoo.fr]
To: mathgroup at smc.vnet.net
> Sent:	Wednesday, February 27, 2002 6:48 AM
> To:	mathgroup at smc.vnet.net
> Subject:	[mg33051] Finding pattern Matched series
> 
> Dear Mathgroup,
> 
> Here is the problem:
> 
> Given the following list T = {{1, 2}, {3, 4}, {2, 3}, {5, 6}, {4, 5}, {6, 
> 4}, {4, 1}} extract ALL the series of the following form:
> 
> {{#, x_}, {x_, #}}
> {{#, x_}, {x_, y_}, {y_, #}}
> {{#, x_}, {x_, y_}, {y_, z_}, {z_, #}}
> etc...
> 
> for the for the value # in Range[Max[T]].
> 
> I guess it is a subtle mixture between Patterns and Maps but I can see it
> :-(
> 
> Thanks for your appreciated help!
> 
> Jari
> 
[Hartmut Wolf]  


Jari,

it takes me some guesswork to reach at your problem. And perhaps, I did not
reach it at all. However I found a nice exercise in Mathematica programming.

Your example data

t = {{1, 2}, {3, 4}, {2, 3}, {5, 6}, {4, 5}, {6, 4}, {4, 1}};

(Now it would have been nice, had you given any solution examples.)



First guess: taken your pattern, and let # denote any instance of a number:

ReplaceList[t, {___, {a_, x_}, {x_, b_}, ___} :> {{a, x}, {x, b}}]

{{{6, 4}, {4, 1}}}
 
You get one match for your example. However not for the longer subsequences:

MatchQ[t, {___, {a_, x_}, {x_, y_}, {y_, b_}, ___}]
False

MatchQ[t, {___, {a_, x_}, {x_, y_}, {y_, z_}, {z_, b_}, ___}]
False



Second guess: taken your pattern, and let # denote the same number for both
instances, however take different instances for (your x_, y_, z_ ...) Then

 
ReplaceList[t, {___, {x_, a_}, {b_, x_}, ___} :> {{x, a}, {b, x}}]

{{{3, 4}, {2, 3}}, {{5, 6}, {4, 5}}, {{4, 5}, {6, 4}}}


% // Length

3

Gives three occurrances, yet still none in the other cases:


ReplaceList[t, {___, {x_, a_}, bb_, {c_, x_}, ___} :> {{x, a}, bb, {c, x}}]

{}


ReplaceList[t, {___, {x_, a_}, 
      bb__ /; Length[{bb}] == 2, {c_, x_}, ___} :> {{x, a}, bb, {c, x}}]
{}
 



Now let's take this as the problem, but reach for more interesting data:

SeedRandom[1]
tt = Table[{Random[Integer, {1, 6}], Random[Integer, {1, 6}]}, {1000}];
 

Define a generator for the pattern, with definable "gap" length n :


subsequenceRule[n_Integer?NonNegative] := 
   {___, {x_, a_}, bb___ /; Length[{bb}] === n, {c_, x_}, ___}
    -> {{x, a}, bb, {c, x}}


ReplaceList[t, subsequenceRule[0]]

{{{3, 4}, {2, 3}}, {{5, 6}, {4, 5}}, {{4, 5}, {6, 4}}}
 

This is the same result on your data.

We compare timings for different lengths of your sequence of pairs:


n = 100:

ReplaceList[Take[tt, 100], subsequenceRule[0]] // Short // Timing

{0.902 Second, 
  {{{3, 3}, {4, 3}}, << 15 >>, {{4, 6}, {5, 4}}}}


n = 200:

ReplaceList[Take[tt, 200], subsequenceRule[0]] // Short // Timing

{5.207 Second,
  {{{3, 3}, {4, 3}}, << 29 >>, {{6, 3}, {4, 6}}}}

(2/0.902)^2.4

6.76046


n = 300:

ReplaceList[Take[tt, 300], subsequenceRule[0]] // Short // Timing

{15.753 Second,
  {{{3, 3}, {4, 3}}, << 46 >>, {{1, 4}, {1, 1}}}}

(3/0.902)^2.4

17.8894


n = 400:

(s0 = ReplaceList[Take[tt, 400], subsequenceRule[0]]) // Short // Timing

{35.301 Second, 
  {{{3, 3}, {4, 3}}, << 61 >>, {{4, 4}, {6, 4}}}}

(4/0.902)^2.4

35.6819


We see a time complexity of worse than O[n^2], so this will not be feasable
for longer sequences (say for looking at DNA).

The Timings do not depend much on the gap length:
 

(s1 = ReplaceList[Take[tt, 400], subsequenceRule[1]]) // Short // Timing

{36.753 Second, 
  {{{6, 1}, {1, 2}, {5, 6}}, << 46 >>, {{3, 6}, {6, 2}, {3, 3}}}}


s2 = ReplaceList[Take[tt, 400], subsequenceRule[2]]; // Timing

{36.823 Second, Null}

Short[s2, 3]

{{{6, 1}, {1, 2}, {5, 6}, {4, 6}}, 
 {{1, 2}, {5, 6}, {4, 6}, {1, 1}}, << 47 >>, 
 {{2, 2}, {1, 4}, {4, 1}, {6, 2}}, 
 {{1, 5}, {3, 5}, {5, 6}, {3, 1}}}


s3 = ReplaceList[Take[tt, 400], subsequenceRule[3]]; // Timing

{37.884 Second, Null}

Short[s3, 3]

{{{6, 3}, {3, 3}, {6, 1}, {1, 2}, {5, 6}}, 
 {{4, 6}, {6, 2}, {3, 4}, {3, 5}, {5, 4}}, << 67 >>, 
 {{2, 6}, {4, 2}, {1, 5}, {3, 6}, {6, 2}}}
 

We had found a method to define patterns with variable length of gap.
However testing for the length of the "gap" is inefficient _after_ the
sequence pattern has matched the gap (there will be too many futile tries).


Better define a "static" pattern for this purpose:

subsequenceRuleStatic[n_Integer?NonNegative] := 
  With[{gapsymbols = Table[Unique[gap], {2n + 2}]},
    (Join[{___}, Map[Pattern[#, Blank[]] &, #, {2}], {___}] -> # &)[
      Partition[Join[{x}, gapsymbols, {x}], 2]]
    ]


subsequenceRuleStatic[0]

{___, {x_, gap$9_}, {gap$10_, x_}, ___} -> {{x, gap$9}, {gap$10, x}}

subsequenceRuleStatic[3]

{___, {x_, gap$11_}, {gap$12_, gap$13_}, {gap$14_, gap$15_}, {gap$16_, 
      gap$17_}, {gap$18_, x_}, ___} -> {{x, gap$11}, {gap$12, 
      gap$13}, {gap$14, gap$15}, {gap$16, gap$17}, {gap$18, x}}
 
Unique gives us new symbols for the pattern variables introduced.



We get the same matches as before.

ss3 = ReplaceList[Take[tt, 400], subsequenceRuleStatic[3]]; // Timing

{0.1 Second, Null}

s3 == ss3

True
 

We try longer sequences now:

SeedRandom[1]

ttt = Table[{Random[Integer, {1, 6}], Random[Integer, {1, 6}]}, {10000}];


Timing[ReplaceList[Take[ttt, 1000*#], subsequenceRuleStatic[3]]][[1]] & /@ 
  Range[4]

{0.531 Second, 2.253 Second, 5.398 Second, 9.043 Second}
 
Better by more than an order of magnitude, but still bad (quadratic) time
complexity.



Here now a much better solution:

subsequenceRulesStaticX[n_Integer?NonNegative] := 
  With[{gapsymbols = Table[Unique[gap], {2n + 2}]},
    {(Map[Pattern[#, Blank[]] &, #, {2}] -> # &)[
        Partition[Join[{x}, gapsymbols, {x}], 2]], _ :> Sequence[]}
    ]


subsequenceRulesStaticX[1]

{{{x_, gap$61_}, {gap$62_, gap$63_}, {gap$64_, x_}} -> {{x, gap$61},
{gap$62, 
        gap$63}, {gap$64, x}}, _ :> Sequence[]}

The idea is not to travel along the primary sequence with gap matching
BlankNullSequence[] but cut the data into appropriate (short) pieces and
test each piece.


sss3 = Replace[Partition[Take[tt, 400], 3 + 2, 1], 
        subsequenceRulesStaticX[3], {1}]; // Timing

{0.04 Second, Null}

ss3 == sss3

True


A little bit better for this problem size, but observe...

Timing[Replace[Partition[Take[ttt, 1000*#], 3 + 2, 1], 
          subsequenceRulesStaticX[3], {1}]][[1]] & /@ Range[10]

{0.08 Second, 0.18 Second, 0.27 Second, 0.351 Second, 0.44 Second, 
  0.541 Second, 0.621 Second, 0.711 Second, 0.791 Second, 0.882 Second}

... linear time complexity now!


You can now check a sequence of a million pairs:

t4 = Table[{Random[Integer, {1, 6}], Random[Integer, {1, 6}]}, {1000000}];

Timing[Length[
    Replace[Partition[t4, 3 + 2, 1], subsequenceRulesStaticX[3], {1}]
      ]]

{112.111 Second, 165743}



If you have nothing to do with the gap values, then it suffices to match
blanks:

Timing[Length[
    Replace[Partition[t4,3+2,1],
            {s:{{x_,_},_,_,_,{_,x_}} :> s,_ :> Sequence[]},{1}]
    ]]

{89.469 Second,165743}



A seemingly simple solution:

m4 = Drop[RotateLeft[t4, #], -4] & /@ Range[0, 4];

res = chainedlist[];

MapThread[
    If[First[#1] === Last[#5], 
        res = chainedlist[res, {#1, #2, #3, #4, #5}]] &, mm];

Length[r2 = List @@ Flatten[res]]

took all of my machine (real memory, disk space; and, on smaller problems,
was slower by at least a factor of 2)


If your problem size exceeds a million, then write a little C-program which
reads and writes a sequential file. It should run at the speed of your basic
I/O.

However, if I have not guessed right for your problem, you perhaps might
have learnt a trick or another.

--
Hartmut Wolf



  • Prev by Date: Re: Mathematica have an 'Assume' type syntax/command?
  • Next by Date: Re: Intersection[...,SameTest] ?
  • Previous by thread: RE: perhaps? RE: Finding pattern Matched series
  • Next by thread: Re: Solve (on Mathematica 4.1 Windows XP)?