Re: "set" data structure in Mathematica? (speeding up graph traversal function)
- To: mathgroup at smc.vnet.net
- Subject: [mg117751] Re: "set" data structure in Mathematica? (speeding up graph traversal function)
- From: David Bevan <david.bevan at pb.com>
- Date: Thu, 31 Mar 2011 03:56:52 -0500 (EST)
With this solution, if more than one graph is to be analysed then the memoized functions that don't take g as a parameter must be Cleared between uses otherwise the results will be incorrect. I prefer to take the cautious approach and write properly encapsulated code that removes such a risk. Presumably there is some way of handling it automatically that would retain the performance increase (perhaps by checking the DownValues of the main function to see if the graph has changed). But that assumes that you never want to call the subsidiary functions directly. It all depends on use / reuse requirements. David %^> -----Original Message----- From: DrMajorBob [mailto:btreat1 at austin.rr.com] Sent: 29 March 2011 20:30 To: mathgroup at smc.vnet.net; David Bevan Cc: koopman at sfu.ca; szhorvat at gmail.com Subject: [mg117751] Re: [mg117635] Re: "set" data structure in Mathematica? (speeding up That solution is remarkable, 3 times faster than my 2x improvement on Ray Koopman's code! Below is another 60% speed-up, mostly from avoiding pattern matching on the argument "g", which doesn't change. I also eliminated some "memoization" that wasn't saving time: ClearAll[gNodes, gSinkNodes, gNext, gPrev, gReachableFrom, gCanReach, \ gBetween, gMaximalPaths, gPaths, gPaths2, gPaths3] g = graph; (* you called this gtest *) gReverse = Reverse[g, 2]; gNodes = Union@Flatten[g /. Rule -> List]; gSinkNodes = Complement[gNodes, Union[First /@ g]]; gNext[s_] := gNext[s] = ReplaceList[s, g] gPrev[t_] := gPrev[t] = ReplaceList[t, gReverse] gReachableFrom[s_] := gReachableFrom[s] = FixedPoint[Union @@ Join[{#}, gNext /@ #] &, {s}] gCanReach[t_] := gCanReach[t] = FixedPoint[Union @@ Join[{#}, gPrev /@ #] &, {t}] gCanReach[t_, x_] := gCanReach[t, x] = FixedPoint[Complement[Union @@ Join[{#}, gPrev /@ #], x] &, {t}] gBetween[s_, t_] := gBetween[s, t] = Intersection[gReachableFrom[s], gCanReach[t, {s}]] gNext[s_, t_] := Intersection[gNext[s], gBetween[s, t]] gMaximalPaths[s_] := Flatten[(gPaths[s, #, {}] &) /@ gSinkNodes, 1] gPaths[s_, t_, x_] /; MemberQ[gReachableFrom[s], t] := gPaths2[s, t, x] gPaths[s_, t_, x_] := {} gPaths2[s_, t_, x_] := gPaths3[s, t, Intersection[x, gBetween[s, t]]] gPaths3[s_, s_, x_] := {{s}} gPaths3[s_, t_, x_] := gPaths3[s, t, x] = (Prepend[#, s] &) /@ Flatten[(gPaths2[#, t, Prepend[x, s]] &) /@ Complement[gNext[s, t], x], 1] Length /@ (gtest = gMaximalPaths /@ gNodes) // Timing {0.097408, {1, 1, 1, 1, 2, 3, 4, 7, 10, 8, 7, 7, 7, 10, 15, 15, 32, 394, 355, 396, 863, 820, 1225, 891, 1211, 827, 11269, 13353, 15430}} Your code's timing was 0.15977 at this machine. Bobby On Tue, 29 Mar 2011 06:50:58 -0500, David Bevan <david.bevan at pb.com> wrote: > Here's an improvement: > > gCanReach[g,t,x] gives a list of the nodes from which there is a path to > t avoiding nodes in list x. > gBetween[g,s,t] gives a list of the nodes that can be reached along a > path from s to t which does not revisit s. > gNext[g,s,t] gives a list of the nodes one step from s towards t. > > By using these, we avoid searching down "blind alleys". > > > ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach,gBetween] > > gNodes[g_]:=gNodes[g]=Union@Flatten[g/.Rule->List] > > gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]] > > gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g] > > gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]] > > gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}] > > gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}] > > gCanReach[g_,t_,x_]:=gCanReach[g,t,x]=FixedPoint[Complement[Union@@Join[{#},(gPrev[g,#]&)/@#],x]&,{t}] > > gBetween[g_,s_,t_]:=gBetween[g,s,t]=Intersection[gReachableFrom[g,s],gCanReach[g,t,{s}]] > > gNext[g_,s_,t_]:=gNext[g,s,t]=Intersection[gNext[g,s],gBetween[g,s,t]] > > > > ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3] > > gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1] > > gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}] > > gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]] > > gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}} > > gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x],1] > > > > gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing > > {0.203,47165} > > > > ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3] > > gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]] > > gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0] > > gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]] > > gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1 > > gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x]] > > > > Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing > > {0.093,47165} > > > This is now about 100 times faster than your original algorithm for the > sample data you provided. > > David %^> > > > ________________________________ > From: David Bevan > Sent: 25 March 2011 21:35 > To: mathgroup at smc.vnet.net; szhorvat at gmail.com > Subject: [mg117635] RE: [mg117576] Re: "set" data structure in > Mathematica? (speeding up > > > > > Your algorithm seems to be counting the number of distinct acyclic paths > from a specified node to any sink node (maximal paths below). > > > > The following seems to be faster than any solution proposed so far. > > > > gReachableFrom[g,s] gives a list of the nodes that can be reached along > a path from s. > > gCanReach[g,t] gives a list of the nodes from which there is a path to t. > > > > ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach] > > gNodes[g_]:=Union@Flatten[g/.Rule->List] > > gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]] > > gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g] > > gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]] > > gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}] > > gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}] > > > > gPaths[g,s,t,x] gives a list of the paths from s to t excluding nodes in > the list x. > > gPaths2 'normalises' x before calling gPaths3. This is the key to the > performance. > > > > ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3] > > gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1] > > gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}] > > gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]] > > gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}} > > gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x],1] > > > > gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing > > {0.328,47165} > > > > This compares with about 2 seconds for DrMajorBob's algorithms on my PC. > > > > Counting without listing all the paths is a little faster. > > > > ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3] > > gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]] > > gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0] > > gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]] > > gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1 > > gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x]] > > > > Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing > > {0.203,47165} > > > > Since all the intermeidate results are saved, a second run is much > faster. > > > > David %^> -- DrMajorBob at yahoo.com