Re: "set" data structure in Mathematica? (speeding up
- To: mathgroup at smc.vnet.net
- Subject: [mg117706] Re: "set" data structure in Mathematica? (speeding up
- From: DrMajorBob <btreat1 at austin.rr.com>
- Date: Wed, 30 Mar 2011 04:10:47 -0500 (EST)
That solution is remarkable, 3 times faster than my 2x improvement on Ray Koopman's code! Below is another 60% speed-up, mostly from avoiding pattern matching on the argument "g", which doesn't change. I also eliminated some "memoization" that wasn't saving time: ClearAll[gNodes, gSinkNodes, gNext, gPrev, gReachableFrom, gCanReach, \ gBetween, gMaximalPaths, gPaths, gPaths2, gPaths3] g = graph; (* you called this gtest *) gReverse = Reverse[g, 2]; gNodes = Union@Flatten[g /. Rule -> List]; gSinkNodes = Complement[gNodes, Union[First /@ g]]; gNext[s_] := gNext[s] = ReplaceList[s, g] gPrev[t_] := gPrev[t] = ReplaceList[t, gReverse] gReachableFrom[s_] := gReachableFrom[s] = FixedPoint[Union @@ Join[{#}, gNext /@ #] &, {s}] gCanReach[t_] := gCanReach[t] = FixedPoint[Union @@ Join[{#}, gPrev /@ #] &, {t}] gCanReach[t_, x_] := gCanReach[t, x] = FixedPoint[Complement[Union @@ Join[{#}, gPrev /@ #], x] &, {t}] gBetween[s_, t_] := gBetween[s, t] = Intersection[gReachableFrom[s], gCanReach[t, {s}]] gNext[s_, t_] := Intersection[gNext[s], gBetween[s, t]] gMaximalPaths[s_] := Flatten[(gPaths[s, #, {}] &) /@ gSinkNodes, 1] gPaths[s_, t_, x_] /; MemberQ[gReachableFrom[s], t] := gPaths2[s, t, x] gPaths[s_, t_, x_] := {} gPaths2[s_, t_, x_] := gPaths3[s, t, Intersection[x, gBetween[s, t]]] gPaths3[s_, s_, x_] := {{s}} gPaths3[s_, t_, x_] := gPaths3[s, t, x] = (Prepend[#, s] &) /@ Flatten[(gPaths2[#, t, Prepend[x, s]] &) /@ Complement[gNext[s, t], x], 1] Length /@ (gtest = gMaximalPaths /@ gNodes) // Timing {0.097408, {1, 1, 1, 1, 2, 3, 4, 7, 10, 8, 7, 7, 7, 10, 15, 15, 32, 394, 355, 396, 863, 820, 1225, 891, 1211, 827, 11269, 13353, 15430}} Your code's timing was 0.15977 at this machine. Bobby On Tue, 29 Mar 2011 06:50:58 -0500, David Bevan <david.bevan at pb.com> wrote: > Here's an improvement: > > gCanReach[g,t,x] gives a list of the nodes from which there is a path to > t avoiding nodes in list x. > gBetween[g,s,t] gives a list of the nodes that can be reached along a > path from s to t which does not revisit s. > gNext[g,s,t] gives a list of the nodes one step from s towards t. > > By using these, we avoid searching down "blind alleys". > > > ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach,gBetween] > > gNodes[g_]:=gNodes[g]=Union@Flatten[g/.Rule->List] > > gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]] > > gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g] > > gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]] > > gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}] > > gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}] > > gCanReach[g_,t_,x_]:=gCanReach[g,t,x]=FixedPoint[Complement[Union@@Join[{#},(gPrev[g,#]&)/@#],x]&,{t}] > > gBetween[g_,s_,t_]:=gBetween[g,s,t]=Intersection[gReachableFrom[g,s],gCanReach[g,t,{s}]] > > gNext[g_,s_,t_]:=gNext[g,s,t]=Intersection[gNext[g,s],gBetween[g,s,t]] > > > > ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3] > > gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1] > > gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}] > > gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]] > > gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}} > > gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x],1] > > > > gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing > > {0.203,47165} > > > > ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3] > > gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]] > > gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0] > > gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]] > > gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1 > > gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x]] > > > > Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing > > {0.093,47165} > > > This is now about 100 times faster than your original algorithm for the > sample data you provided. > > David %^> > > > ________________________________ > From: David Bevan > Sent: 25 March 2011 21:35 > To: mathgroup at smc.vnet.net; szhorvat at gmail.com > Subject: [mg117635] RE: [mg117576] Re: "set" data structure in > Mathematica? (speeding up > > > > > Your algorithm seems to be counting the number of distinct acyclic paths > from a specified node to any sink node (maximal paths below). > > > > The following seems to be faster than any solution proposed so far. > > > > gReachableFrom[g,s] gives a list of the nodes that can be reached along > a path from s. > > gCanReach[g,t] gives a list of the nodes from which there is a path to t. > > > > ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach] > > gNodes[g_]:=Union@Flatten[g/.Rule->List] > > gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]] > > gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g] > > gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]] > > gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}] > > gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}] > > > > gPaths[g,s,t,x] gives a list of the paths from s to t excluding nodes in > the list x. > > gPaths2 'normalises' x before calling gPaths3. This is the key to the > performance. > > > > ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3] > > gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1] > > gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}] > > gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]] > > gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}} > > gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x],1] > > > > gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing > > {0.328,47165} > > > > This compares with about 2 seconds for DrMajorBob's algorithms on my PC. > > > > Counting without listing all the paths is a little faster. > > > > ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3] > > gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]] > > gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0] > > gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]] > > gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1 > > gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x]] > > > > Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing > > {0.203,47165} > > > > Since all the intermeidate results are saved, a second run is much > faster. > > > > David %^> -- DrMajorBob at yahoo.com