Re: "set" data structure in Mathematica? (speeding up graph traversal function)
- To: mathgroup at smc.vnet.net
- Subject: [mg117774] Re: "set" data structure in Mathematica? (speeding up graph traversal function)
- From: DrMajorBob <btreat1 at austin.rr.com>
- Date: Thu, 31 Mar 2011 04:01:01 -0500 (EST)
It's not hard to "properly" encapsulate the code, either way. For instance:
acyclicPaths[g_] :=
Module[{forward, reverse, sources, nodes, sinks, to, from, next,
last, reachableFrom, canReach, between, maximalPaths, paths,
paths2, paths3},
forward = Dispatch@g;
reverse = Dispatch@Reverse[g, 2];
sources = Union@g[[All, 1]];
nodes = Union@Flatten[g /. Rule -> List];
sinks = Complement[nodes, sources];
reachableFrom[s_] :=
reachableFrom[s] = FixedPoint[Union @@ Join[{#}, next /@ #] &, {s}];
canReach[t_, x_] :=
canReach[t, x] = FixedPoint[Union @@ Join[{#}, last /@ #] &, {t}];
between[s_, t_] :=
between[s, t] = Intersection[reachableFrom@s, canReach[t, {s}]];
next[s_] := next[s] = ReplaceList[s, forward];
last[s_] := last[s] = ReplaceList[s, reverse];
next[s_, t_] := Intersection[next@s, between[s, t]];
maximalPaths[s_] := Flatten[(paths[s, #, {}] &) /@ sinks, 1];
paths[s_, t_, x_] /; MemberQ[reachableFrom@s, t] :=
paths3[s, t, Intersection[x, between[s, t]]];
paths[s_, t_, x_] = {};
paths2[s_, t_, x_] := paths3[s, t, Intersection[x, between[s, t]]];
paths3[s_, s_, x_] := {{s}};
paths3[s_, t_, x_] :=
paths3[s, t, x] = (Prepend[#, s] &) /@
Flatten[(paths2[#, t, Prepend[x, s]] &) /@
Complement[next[s, t], x], 1];
maximalPaths /@ sources
]
Length /@ (gtest = acyclicPaths@graph) // Timing
{0.128763, {1, 1, 2, 3, 4, 7, 10, 8, 7, 7, 7, 10, 15, 15, 32, 394,
355, 396, 863, 820, 1225, 891, 1211, 827, 11269, 13353, 15430}}
If it's necessary to save memoized functions, that can also be done.
Bobby
On Wed, 30 Mar 2011 02:56:31 -0500, David Bevan <david.bevan at pb.com> wrote:
>
> With this solution, if more than one graph is to be analysed then the
> memoized functions that don't take g as a parameter must be Cleared
> between uses otherwise the results will be incorrect. I prefer to take
> the cautious approach and write properly encapsulated code that removes
> such a risk.
>
> Presumably there is some way of handling it automatically that would
> retain the performance increase (perhaps by checking the DownValues of
> the main function to see if the graph has changed). But that assumes
> that you never want to call the subsidiary functions directly.
>
> It all depends on use / reuse requirements.
>
> David %^>
>
>
> -----Original Message-----
> From: DrMajorBob [mailto:btreat1 at austin.rr.com]
> Sent: 29 March 2011 20:30
> To: mathgroup at smc.vnet.net; David Bevan
> Cc: koopman at sfu.ca; szhorvat at gmail.com
> Subject: Re: [mg117635] Re: "set" data structure in Mathematica?
> (speeding up
>
> That solution is remarkable, 3 times faster than my 2x improvement on Ray
> Koopman's code!
>
> Below is another 60% speed-up, mostly from avoiding pattern matching on
> the argument "g", which doesn't change.
>
> I also eliminated some "memoization" that wasn't saving time:
>
> ClearAll[gNodes, gSinkNodes, gNext, gPrev, gReachableFrom, gCanReach, \
> gBetween, gMaximalPaths, gPaths, gPaths2, gPaths3]
> g = graph; (* you called this gtest *)
> gReverse = Reverse[g, 2];
> gNodes = Union@Flatten[g /. Rule -> List];
> gSinkNodes = Complement[gNodes, Union[First /@ g]];
>
> gNext[s_] := gNext[s] = ReplaceList[s, g]
> gPrev[t_] := gPrev[t] = ReplaceList[t, gReverse]
> gReachableFrom[s_] :=
> gReachableFrom[s] =
> FixedPoint[Union @@ Join[{#}, gNext /@ #] &, {s}]
> gCanReach[t_] :=
> gCanReach[t] = FixedPoint[Union @@ Join[{#}, gPrev /@ #] &, {t}]
> gCanReach[t_, x_] :=
> gCanReach[t, x] =
> FixedPoint[Complement[Union @@ Join[{#}, gPrev /@ #], x] &, {t}]
> gBetween[s_, t_] :=
> gBetween[s, t] = Intersection[gReachableFrom[s], gCanReach[t, {s}]]
> gNext[s_, t_] := Intersection[gNext[s], gBetween[s, t]]
> gMaximalPaths[s_] := Flatten[(gPaths[s, #, {}] &) /@ gSinkNodes, 1]
> gPaths[s_, t_, x_] /; MemberQ[gReachableFrom[s], t] := gPaths2[s, t, x]
> gPaths[s_, t_, x_] := {}
> gPaths2[s_, t_, x_] := gPaths3[s, t, Intersection[x, gBetween[s, t]]]
> gPaths3[s_, s_, x_] := {{s}}
> gPaths3[s_, t_, x_] :=
> gPaths3[s, t, x] = (Prepend[#, s] &) /@
> Flatten[(gPaths2[#, t, Prepend[x, s]] &) /@
> Complement[gNext[s, t], x], 1]
>
> Length /@ (gtest = gMaximalPaths /@ gNodes) // Timing
>
> {0.097408, {1, 1, 1, 1, 2, 3, 4, 7, 10, 8, 7, 7, 7, 10, 15, 15, 32,
> 394, 355, 396, 863, 820, 1225, 891, 1211, 827, 11269, 13353, 15430}}
>
> Your code's timing was 0.15977 at this machine.
>
> Bobby
>
> On Tue, 29 Mar 2011 06:50:58 -0500, David Bevan <david.bevan at pb.com>
> wrote:
>
>> Here's an improvement:
>>
>> gCanReach[g,t,x] gives a list of the nodes from which there is a path to
>> t avoiding nodes in list x.
>> gBetween[g,s,t] gives a list of the nodes that can be reached along a
>> path from s to t which does not revisit s.
>> gNext[g,s,t] gives a list of the nodes one step from s towards t.
>>
>> By using these, we avoid searching down "blind alleys".
>>
>>
>> ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach,gBetween]
>>
>> gNodes[g_]:=gNodes[g]=Union@Flatten[g/.Rule->List]
>>
>> gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]]
>>
>> gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g]
>>
>> gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]]
>>
>> gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}]
>>
>> gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}]
>>
>> gCanReach[g_,t_,x_]:=gCanReach[g,t,x]=FixedPoint[Complement[Union@@Join[{#},(gPrev[g,#]&)/@#],x]&,{t}]
>>
>> gBetween[g_,s_,t_]:=gBetween[g,s,t]=Intersection[gReachableFrom[g,s],gCanReach[g,t,{s}]]
>>
>> gNext[g_,s_,t_]:=gNext[g,s,t]=Intersection[gNext[g,s],gBetween[g,s,t]]
>>
>>
>>
>> ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3]
>>
>> gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1]
>>
>> gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}]
>>
>> gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]]
>>
>> gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}}
>>
>> gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x],1]
>>
>>
>>
>> gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing
>>
>> {0.203,47165}
>>
>>
>>
>> ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3]
>>
>> gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]]
>>
>> gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0]
>>
>> gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]]
>>
>> gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1
>>
>> gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x]]
>>
>>
>>
>> Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing
>>
>> {0.093,47165}
>>
>>
>> This is now about 100 times faster than your original algorithm for the
>> sample data you provided.
>>
>> David %^>
>>
>>
>> ________________________________
>> From: David Bevan
>> Sent: 25 March 2011 21:35
>> To: mathgroup at smc.vnet.net; szhorvat at gmail.com
>> Subject: [mg117635] RE: [mg117576] Re: "set" data structure in
>> Mathematica? (speeding up
>>
>>
>>
>>
>> Your algorithm seems to be counting the number of distinct acyclic paths
>> from a specified node to any sink node (maximal paths below).
>>
>>
>>
>> The following seems to be faster than any solution proposed so far.
>>
>>
>>
>> gReachableFrom[g,s] gives a list of the nodes that can be reached along
>> a path from s.
>>
>> gCanReach[g,t] gives a list of the nodes from which there is a path to
>> t.
>>
>>
>>
>> ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach]
>>
>> gNodes[g_]:=Union@Flatten[g/.Rule->List]
>>
>> gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]]
>>
>> gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g]
>>
>> gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]]
>>
>> gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}]
>>
>> gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}]
>>
>>
>>
>> gPaths[g,s,t,x] gives a list of the paths from s to t excluding nodes in
>> the list x.
>>
>> gPaths2 'normalises' x before calling gPaths3. This is the key to the
>> performance.
>>
>>
>>
>> ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3]
>>
>> gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1]
>>
>> gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}]
>>
>> gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]]
>>
>> gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}}
>>
>> gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x],1]
>>
>>
>>
>> gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing
>>
>> {0.328,47165}
>>
>>
>>
>> This compares with about 2 seconds for DrMajorBob's algorithms on my PC.
>>
>>
>>
>> Counting without listing all the paths is a little faster.
>>
>>
>>
>> ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3]
>>
>> gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]]
>>
>> gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0]
>>
>> gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]]
>>
>> gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1
>>
>> gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x]]
>>
>>
>>
>> Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing
>>
>> {0.203,47165}
>>
>>
>>
>> Since all the intermeidate results are saved, a second run is much
>> faster.
>>
>>
>>
>> David %^>
>
>
--
DrMajorBob at yahoo.com