MathGroup Archive 2011

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: "set" data structure in Mathematica? (speeding up graph traversal function)

  • To: mathgroup at smc.vnet.net
  • Subject: [mg117751] Re: "set" data structure in Mathematica? (speeding up graph traversal function)
  • From: David Bevan <david.bevan at pb.com>
  • Date: Thu, 31 Mar 2011 03:56:52 -0500 (EST)

With this solution, if more than one graph is to be analysed then the memoized functions that don't take g as a parameter must be Cleared between uses otherwise the results will be incorrect. I prefer to take the cautious approach and write properly encapsulated code that removes such a risk.

Presumably there is some way of handling it automatically that would retain the performance increase (perhaps by checking the DownValues of the main function to see if the graph has changed). But that assumes that you never want to call the subsidiary functions directly.

It all depends on use / reuse requirements.

David %^>


-----Original Message-----
From: DrMajorBob [mailto:btreat1 at austin.rr.com]
Sent: 29 March 2011 20:30
To: mathgroup at smc.vnet.net; David Bevan
Cc: koopman at sfu.ca; szhorvat at gmail.com
Subject: [mg117751] Re: [mg117635] Re: "set" data structure in Mathematica? (speeding up

That solution is remarkable, 3 times faster than my 2x improvement on Ray 
Koopman's code!

Below is another 60% speed-up, mostly from avoiding pattern matching on 
the argument "g", which doesn't change.

I also eliminated some "memoization" that wasn't saving time:

ClearAll[gNodes, gSinkNodes, gNext, gPrev, gReachableFrom, gCanReach, \
gBetween, gMaximalPaths, gPaths, gPaths2, gPaths3]
g = graph; (* you called this gtest *)
gReverse = Reverse[g, 2];
gNodes = Union@Flatten[g /. Rule -> List];
gSinkNodes = Complement[gNodes, Union[First /@ g]];

gNext[s_] := gNext[s] = ReplaceList[s, g]
gPrev[t_] := gPrev[t] = ReplaceList[t, gReverse]
gReachableFrom[s_] :=
  gReachableFrom[s] =
   FixedPoint[Union @@ Join[{#}, gNext /@ #] &, {s}]
gCanReach[t_] :=
  gCanReach[t] = FixedPoint[Union @@ Join[{#}, gPrev /@ #] &, {t}]
gCanReach[t_, x_] :=
  gCanReach[t, x] =
   FixedPoint[Complement[Union @@ Join[{#}, gPrev /@ #], x] &, {t}]
gBetween[s_, t_] :=
  gBetween[s, t] = Intersection[gReachableFrom[s], gCanReach[t, {s}]]
gNext[s_, t_] := Intersection[gNext[s], gBetween[s, t]]
gMaximalPaths[s_] := Flatten[(gPaths[s, #, {}] &) /@ gSinkNodes, 1]
gPaths[s_, t_, x_] /; MemberQ[gReachableFrom[s], t] := gPaths2[s, t, x]
gPaths[s_, t_, x_] := {}
gPaths2[s_, t_, x_] := gPaths3[s, t, Intersection[x, gBetween[s, t]]]
gPaths3[s_, s_, x_] := {{s}}
gPaths3[s_, t_, x_] :=
  gPaths3[s, t, x] = (Prepend[#, s] &) /@
    Flatten[(gPaths2[#, t, Prepend[x, s]] &) /@
      Complement[gNext[s, t], x], 1]

Length /@ (gtest = gMaximalPaths /@ gNodes) // Timing

{0.097408, {1, 1, 1, 1, 2, 3, 4, 7, 10, 8, 7, 7, 7, 10, 15, 15, 32,
   394, 355, 396, 863, 820, 1225, 891, 1211, 827, 11269, 13353, 15430}}

Your code's timing was 0.15977 at this machine.

Bobby

On Tue, 29 Mar 2011 06:50:58 -0500, David Bevan <david.bevan at pb.com> wrote:

> Here's an improvement:
>
> gCanReach[g,t,x] gives a list of the nodes from which there is a path to 
> t avoiding nodes in list x.
> gBetween[g,s,t] gives a list of the nodes that can be reached along a 
> path from s to t which does not revisit s.
> gNext[g,s,t] gives a list of the nodes one step from s towards t.
>
> By using these, we avoid searching down "blind alleys".
>
>
> ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach,gBetween]
>
> gNodes[g_]:=gNodes[g]=Union@Flatten[g/.Rule->List]
>
> gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]]
>
> gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g]
>
> gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]]
>
> gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}]
>
> gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}]
>
> gCanReach[g_,t_,x_]:=gCanReach[g,t,x]=FixedPoint[Complement[Union@@Join[{#},(gPrev[g,#]&)/@#],x]&,{t}]
>
> gBetween[g_,s_,t_]:=gBetween[g,s,t]=Intersection[gReachableFrom[g,s],gCanReach[g,t,{s}]]
>
> gNext[g_,s_,t_]:=gNext[g,s,t]=Intersection[gNext[g,s],gBetween[g,s,t]]
>
>
>
> ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3]
>
> gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1]
>
> gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}]
>
> gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]]
>
> gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}}
>
> gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x],1]
>
>
>
> gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing
>
> {0.203,47165}
>
>
>
> ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3]
>
> gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]]
>
> gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0]
>
> gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]]
>
> gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1
>
> gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x]]
>
>
>
> Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing
>
> {0.093,47165}
>
>
> This is now about 100 times faster than your original algorithm for the 
> sample data you provided.
>
> David %^>
>
>
> ________________________________
> From: David Bevan
> Sent: 25 March 2011 21:35
> To: mathgroup at smc.vnet.net; szhorvat at gmail.com
> Subject: [mg117635] RE: [mg117576] Re: "set" data structure in 
> Mathematica? (speeding up
>
>
>
>
> Your algorithm seems to be counting the number of distinct acyclic paths 
> from a specified node to any sink node (maximal paths below).
>
>
>
> The following seems to be faster than any solution proposed so far.
>
>
>
> gReachableFrom[g,s] gives a list of the nodes that can be reached along 
> a path from s.
>
> gCanReach[g,t] gives a list of the nodes from which there is a path to t.
>
>
>
> ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach]
>
> gNodes[g_]:=Union@Flatten[g/.Rule->List]
>
> gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]]
>
> gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g]
>
> gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]]
>
> gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}]
>
> gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}]
>
>
>
> gPaths[g,s,t,x] gives a list of the paths from s to t excluding nodes in 
> the list x.
>
> gPaths2 'normalises' x before calling gPaths3. This is the key to the 
> performance.
>
>
>
> ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3]
>
> gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1]
>
> gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}]
>
> gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]]
>
> gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}}
>
> gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x],1]
>
>
>
> gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing
>
> {0.328,47165}
>
>
>
> This compares with about 2 seconds for DrMajorBob's algorithms on my PC.
>
>
>
> Counting without listing all the paths is a little faster.
>
>
>
> ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3]
>
> gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]]
>
> gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0]
>
> gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]]
>
> gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1
>
> gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x]]
>
>
>
> Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing
>
> {0.203,47165}
>
>
>
> Since all the intermeidate results are saved, a second run is much 
> faster.
>
>
>
> David %^>


--
DrMajorBob at yahoo.com


  • Prev by Date: Re: Filling Plots to the X-Axis for a Range of X Values
  • Next by Date: NDSolve and Plot
  • Previous by thread: Re: "set" data structure in Mathematica? (speeding up graph traversal function)
  • Next by thread: Re: "set" data structure in Mathematica? (speeding up graph traversal function)