Re: "set" data structure in Mathematica? (speeding up
- To: mathgroup at smc.vnet.net
- Subject: [mg117635] Re: "set" data structure in Mathematica? (speeding up
- From: David Bevan <david.bevan at pb.com>
- Date: Tue, 29 Mar 2011 06:50:58 -0500 (EST)
Here's an improvement:
gCanReach[g,t,x] gives a list of the nodes from which there is a path to t avoiding nodes in list x.
gBetween[g,s,t] gives a list of the nodes that can be reached along a path from s to t which does not revisit s.
gNext[g,s,t] gives a list of the nodes one step from s towards t.
By using these, we avoid searching down "blind alleys".
ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach,gBetween]
gNodes[g_]:=gNodes[g]=Union@Flatten[g/.Rule->List]
gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]]
gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g]
gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]]
gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}]
gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}]
gCanReach[g_,t_,x_]:=gCanReach[g,t,x]=FixedPoint[Complement[Union@@Join[{#},(gPrev[g,#]&)/@#],x]&,{t}]
gBetween[g_,s_,t_]:=gBetween[g,s,t]=Intersection[gReachableFrom[g,s],gCanReach[g,t,{s}]]
gNext[g_,s_,t_]:=gNext[g,s,t]=Intersection[gNext[g,s],gBetween[g,s,t]]
ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3]
gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1]
gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}]
gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]]
gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}}
gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x],1]
gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing
{0.203,47165}
ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3]
gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]]
gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0]
gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]]
gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1
gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x]]
Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing
{0.093,47165}
This is now about 100 times faster than your original algorithm for the sample data you provided.
David %^>
________________________________
From: David Bevan
Sent: 25 March 2011 21:35
To: mathgroup at smc.vnet.net; szhorvat at gmail.com
Subject: [mg117635] RE: [mg117576] Re: "set" data structure in Mathematica? (speeding up
Your algorithm seems to be counting the number of distinct acyclic paths from a specified node to any sink node (maximal paths below).
The following seems to be faster than any solution proposed so far.
gReachableFrom[g,s] gives a list of the nodes that can be reached along a path from s.
gCanReach[g,t] gives a list of the nodes from which there is a path to t.
ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach]
gNodes[g_]:=Union@Flatten[g/.Rule->List]
gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]]
gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g]
gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]]
gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}]
gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}]
gPaths[g,s,t,x] gives a list of the paths from s to t excluding nodes in the list x.
gPaths2 'normalises' x before calling gPaths3. This is the key to the performance.
ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3]
gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1]
gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}]
gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]]
gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}}
gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x],1]
gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing
{0.328,47165}
This compares with about 2 seconds for DrMajorBob's algorithms on my PC.
Counting without listing all the paths is a little faster.
ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3]
gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]]
gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0]
gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]]
gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1
gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x]]
Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing
{0.203,47165}
Since all the intermeidate results are saved, a second run is much faster.
David %^>