Re: "set" data structure in Mathematica? (speeding up graph traversal function)
- To: mathgroup at smc.vnet.net
- Subject: [mg117774] Re: "set" data structure in Mathematica? (speeding up graph traversal function)
- From: DrMajorBob <btreat1 at austin.rr.com>
- Date: Thu, 31 Mar 2011 04:01:01 -0500 (EST)
It's not hard to "properly" encapsulate the code, either way. For instance: acyclicPaths[g_] := Module[{forward, reverse, sources, nodes, sinks, to, from, next, last, reachableFrom, canReach, between, maximalPaths, paths, paths2, paths3}, forward = Dispatch@g; reverse = Dispatch@Reverse[g, 2]; sources = Union@g[[All, 1]]; nodes = Union@Flatten[g /. Rule -> List]; sinks = Complement[nodes, sources]; reachableFrom[s_] := reachableFrom[s] = FixedPoint[Union @@ Join[{#}, next /@ #] &, {s}]; canReach[t_, x_] := canReach[t, x] = FixedPoint[Union @@ Join[{#}, last /@ #] &, {t}]; between[s_, t_] := between[s, t] = Intersection[reachableFrom@s, canReach[t, {s}]]; next[s_] := next[s] = ReplaceList[s, forward]; last[s_] := last[s] = ReplaceList[s, reverse]; next[s_, t_] := Intersection[next@s, between[s, t]]; maximalPaths[s_] := Flatten[(paths[s, #, {}] &) /@ sinks, 1]; paths[s_, t_, x_] /; MemberQ[reachableFrom@s, t] := paths3[s, t, Intersection[x, between[s, t]]]; paths[s_, t_, x_] = {}; paths2[s_, t_, x_] := paths3[s, t, Intersection[x, between[s, t]]]; paths3[s_, s_, x_] := {{s}}; paths3[s_, t_, x_] := paths3[s, t, x] = (Prepend[#, s] &) /@ Flatten[(paths2[#, t, Prepend[x, s]] &) /@ Complement[next[s, t], x], 1]; maximalPaths /@ sources ] Length /@ (gtest = acyclicPaths@graph) // Timing {0.128763, {1, 1, 2, 3, 4, 7, 10, 8, 7, 7, 7, 10, 15, 15, 32, 394, 355, 396, 863, 820, 1225, 891, 1211, 827, 11269, 13353, 15430}} If it's necessary to save memoized functions, that can also be done. Bobby On Wed, 30 Mar 2011 02:56:31 -0500, David Bevan <david.bevan at pb.com> wrote: > > With this solution, if more than one graph is to be analysed then the > memoized functions that don't take g as a parameter must be Cleared > between uses otherwise the results will be incorrect. I prefer to take > the cautious approach and write properly encapsulated code that removes > such a risk. > > Presumably there is some way of handling it automatically that would > retain the performance increase (perhaps by checking the DownValues of > the main function to see if the graph has changed). But that assumes > that you never want to call the subsidiary functions directly. > > It all depends on use / reuse requirements. > > David %^> > > > -----Original Message----- > From: DrMajorBob [mailto:btreat1 at austin.rr.com] > Sent: 29 March 2011 20:30 > To: mathgroup at smc.vnet.net; David Bevan > Cc: koopman at sfu.ca; szhorvat at gmail.com > Subject: Re: [mg117635] Re: "set" data structure in Mathematica? > (speeding up > > That solution is remarkable, 3 times faster than my 2x improvement on Ray > Koopman's code! > > Below is another 60% speed-up, mostly from avoiding pattern matching on > the argument "g", which doesn't change. > > I also eliminated some "memoization" that wasn't saving time: > > ClearAll[gNodes, gSinkNodes, gNext, gPrev, gReachableFrom, gCanReach, \ > gBetween, gMaximalPaths, gPaths, gPaths2, gPaths3] > g = graph; (* you called this gtest *) > gReverse = Reverse[g, 2]; > gNodes = Union@Flatten[g /. Rule -> List]; > gSinkNodes = Complement[gNodes, Union[First /@ g]]; > > gNext[s_] := gNext[s] = ReplaceList[s, g] > gPrev[t_] := gPrev[t] = ReplaceList[t, gReverse] > gReachableFrom[s_] := > gReachableFrom[s] = > FixedPoint[Union @@ Join[{#}, gNext /@ #] &, {s}] > gCanReach[t_] := > gCanReach[t] = FixedPoint[Union @@ Join[{#}, gPrev /@ #] &, {t}] > gCanReach[t_, x_] := > gCanReach[t, x] = > FixedPoint[Complement[Union @@ Join[{#}, gPrev /@ #], x] &, {t}] > gBetween[s_, t_] := > gBetween[s, t] = Intersection[gReachableFrom[s], gCanReach[t, {s}]] > gNext[s_, t_] := Intersection[gNext[s], gBetween[s, t]] > gMaximalPaths[s_] := Flatten[(gPaths[s, #, {}] &) /@ gSinkNodes, 1] > gPaths[s_, t_, x_] /; MemberQ[gReachableFrom[s], t] := gPaths2[s, t, x] > gPaths[s_, t_, x_] := {} > gPaths2[s_, t_, x_] := gPaths3[s, t, Intersection[x, gBetween[s, t]]] > gPaths3[s_, s_, x_] := {{s}} > gPaths3[s_, t_, x_] := > gPaths3[s, t, x] = (Prepend[#, s] &) /@ > Flatten[(gPaths2[#, t, Prepend[x, s]] &) /@ > Complement[gNext[s, t], x], 1] > > Length /@ (gtest = gMaximalPaths /@ gNodes) // Timing > > {0.097408, {1, 1, 1, 1, 2, 3, 4, 7, 10, 8, 7, 7, 7, 10, 15, 15, 32, > 394, 355, 396, 863, 820, 1225, 891, 1211, 827, 11269, 13353, 15430}} > > Your code's timing was 0.15977 at this machine. > > Bobby > > On Tue, 29 Mar 2011 06:50:58 -0500, David Bevan <david.bevan at pb.com> > wrote: > >> Here's an improvement: >> >> gCanReach[g,t,x] gives a list of the nodes from which there is a path to >> t avoiding nodes in list x. >> gBetween[g,s,t] gives a list of the nodes that can be reached along a >> path from s to t which does not revisit s. >> gNext[g,s,t] gives a list of the nodes one step from s towards t. >> >> By using these, we avoid searching down "blind alleys". >> >> >> ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach,gBetween] >> >> gNodes[g_]:=gNodes[g]=Union@Flatten[g/.Rule->List] >> >> gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]] >> >> gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g] >> >> gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]] >> >> gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}] >> >> gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}] >> >> gCanReach[g_,t_,x_]:=gCanReach[g,t,x]=FixedPoint[Complement[Union@@Join[{#},(gPrev[g,#]&)/@#],x]&,{t}] >> >> gBetween[g_,s_,t_]:=gBetween[g,s,t]=Intersection[gReachableFrom[g,s],gCanReach[g,t,{s}]] >> >> gNext[g_,s_,t_]:=gNext[g,s,t]=Intersection[gNext[g,s],gBetween[g,s,t]] >> >> >> >> ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3] >> >> gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1] >> >> gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}] >> >> gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]] >> >> gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}} >> >> gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x],1] >> >> >> >> gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing >> >> {0.203,47165} >> >> >> >> ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3] >> >> gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]] >> >> gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0] >> >> gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]] >> >> gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1 >> >> gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x]] >> >> >> >> Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing >> >> {0.093,47165} >> >> >> This is now about 100 times faster than your original algorithm for the >> sample data you provided. >> >> David %^> >> >> >> ________________________________ >> From: David Bevan >> Sent: 25 March 2011 21:35 >> To: mathgroup at smc.vnet.net; szhorvat at gmail.com >> Subject: [mg117635] RE: [mg117576] Re: "set" data structure in >> Mathematica? (speeding up >> >> >> >> >> Your algorithm seems to be counting the number of distinct acyclic paths >> from a specified node to any sink node (maximal paths below). >> >> >> >> The following seems to be faster than any solution proposed so far. >> >> >> >> gReachableFrom[g,s] gives a list of the nodes that can be reached along >> a path from s. >> >> gCanReach[g,t] gives a list of the nodes from which there is a path to >> t. >> >> >> >> ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach] >> >> gNodes[g_]:=Union@Flatten[g/.Rule->List] >> >> gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]] >> >> gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g] >> >> gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]] >> >> gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}] >> >> gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}] >> >> >> >> gPaths[g,s,t,x] gives a list of the paths from s to t excluding nodes in >> the list x. >> >> gPaths2 'normalises' x before calling gPaths3. This is the key to the >> performance. >> >> >> >> ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3] >> >> gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1] >> >> gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}] >> >> gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]] >> >> gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}} >> >> gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x],1] >> >> >> >> gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing >> >> {0.328,47165} >> >> >> >> This compares with about 2 seconds for DrMajorBob's algorithms on my PC. >> >> >> >> Counting without listing all the paths is a little faster. >> >> >> >> ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3] >> >> gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]] >> >> gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0] >> >> gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]] >> >> gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1 >> >> gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x]] >> >> >> >> Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing >> >> {0.203,47165} >> >> >> >> Since all the intermeidate results are saved, a second run is much >> faster. >> >> >> >> David %^> > > -- DrMajorBob at yahoo.com