MathGroup Archive 2011

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: "set" data structure in Mathematica? (speeding up graph traversal function)

  • To: mathgroup at smc.vnet.net
  • Subject: [mg117774] Re: "set" data structure in Mathematica? (speeding up graph traversal function)
  • From: DrMajorBob <btreat1 at austin.rr.com>
  • Date: Thu, 31 Mar 2011 04:01:01 -0500 (EST)

It's not hard to "properly" encapsulate the code, either way. For instance:

acyclicPaths[g_] :=
  Module[{forward, reverse, sources, nodes, sinks, to, from, next,
    last, reachableFrom, canReach, between, maximalPaths, paths,
    paths2, paths3},
   forward = Dispatch@g;
   reverse = Dispatch@Reverse[g, 2];
   sources = Union@g[[All, 1]];
   nodes = Union@Flatten[g /. Rule -> List];
   sinks = Complement[nodes, sources];
   reachableFrom[s_] :=
    reachableFrom[s] = FixedPoint[Union @@ Join[{#}, next /@ #] &, {s}];
   canReach[t_, x_] :=
    canReach[t, x] = FixedPoint[Union @@ Join[{#}, last /@ #] &, {t}];
   between[s_, t_] :=
    between[s, t] = Intersection[reachableFrom@s, canReach[t, {s}]];
   next[s_] := next[s] = ReplaceList[s, forward];
   last[s_] := last[s] = ReplaceList[s, reverse];
   next[s_, t_] := Intersection[next@s, between[s, t]];
   maximalPaths[s_] := Flatten[(paths[s, #, {}] &) /@ sinks, 1];
   paths[s_, t_, x_] /; MemberQ[reachableFrom@s, t] :=
    paths3[s, t, Intersection[x, between[s, t]]];
   paths[s_, t_, x_] = {};
   paths2[s_, t_, x_] := paths3[s, t, Intersection[x, between[s, t]]];
   paths3[s_, s_, x_] := {{s}};
   paths3[s_, t_, x_] :=
    paths3[s, t, x] = (Prepend[#, s] &) /@
      Flatten[(paths2[#, t, Prepend[x, s]] &) /@
        Complement[next[s, t], x], 1];
   maximalPaths /@ sources
   ]
Length /@ (gtest = acyclicPaths@graph) // Timing

{0.128763, {1, 1, 2, 3, 4, 7, 10, 8, 7, 7, 7, 10, 15, 15, 32, 394,
   355, 396, 863, 820, 1225, 891, 1211, 827, 11269, 13353, 15430}}

If it's necessary to save memoized functions, that can also be done.

Bobby

On Wed, 30 Mar 2011 02:56:31 -0500, David Bevan <david.bevan at pb.com> wrote:

>
> With this solution, if more than one graph is to be analysed then the  
> memoized functions that don't take g as a parameter must be Cleared  
> between uses otherwise the results will be incorrect. I prefer to take  
> the cautious approach and write properly encapsulated code that removes  
> such a risk.
>
> Presumably there is some way of handling it automatically that would  
> retain the performance increase (perhaps by checking the DownValues of  
> the main function to see if the graph has changed). But that assumes  
> that you never want to call the subsidiary functions directly.
>
> It all depends on use / reuse requirements.
>
> David %^>
>
>
> -----Original Message-----
> From: DrMajorBob [mailto:btreat1 at austin.rr.com]
> Sent: 29 March 2011 20:30
> To: mathgroup at smc.vnet.net; David Bevan
> Cc: koopman at sfu.ca; szhorvat at gmail.com
> Subject: Re: [mg117635] Re: "set" data structure in Mathematica?  
> (speeding up
>
> That solution is remarkable, 3 times faster than my 2x improvement on Ray
> Koopman's code!
>
> Below is another 60% speed-up, mostly from avoiding pattern matching on
> the argument "g", which doesn't change.
>
> I also eliminated some "memoization" that wasn't saving time:
>
> ClearAll[gNodes, gSinkNodes, gNext, gPrev, gReachableFrom, gCanReach, \
> gBetween, gMaximalPaths, gPaths, gPaths2, gPaths3]
> g = graph; (* you called this gtest *)
> gReverse = Reverse[g, 2];
> gNodes = Union@Flatten[g /. Rule -> List];
> gSinkNodes = Complement[gNodes, Union[First /@ g]];
>
> gNext[s_] := gNext[s] = ReplaceList[s, g]
> gPrev[t_] := gPrev[t] = ReplaceList[t, gReverse]
> gReachableFrom[s_] :=
>   gReachableFrom[s] =
>    FixedPoint[Union @@ Join[{#}, gNext /@ #] &, {s}]
> gCanReach[t_] :=
>   gCanReach[t] = FixedPoint[Union @@ Join[{#}, gPrev /@ #] &, {t}]
> gCanReach[t_, x_] :=
>   gCanReach[t, x] =
>    FixedPoint[Complement[Union @@ Join[{#}, gPrev /@ #], x] &, {t}]
> gBetween[s_, t_] :=
>   gBetween[s, t] = Intersection[gReachableFrom[s], gCanReach[t, {s}]]
> gNext[s_, t_] := Intersection[gNext[s], gBetween[s, t]]
> gMaximalPaths[s_] := Flatten[(gPaths[s, #, {}] &) /@ gSinkNodes, 1]
> gPaths[s_, t_, x_] /; MemberQ[gReachableFrom[s], t] := gPaths2[s, t, x]
> gPaths[s_, t_, x_] := {}
> gPaths2[s_, t_, x_] := gPaths3[s, t, Intersection[x, gBetween[s, t]]]
> gPaths3[s_, s_, x_] := {{s}}
> gPaths3[s_, t_, x_] :=
>   gPaths3[s, t, x] = (Prepend[#, s] &) /@
>     Flatten[(gPaths2[#, t, Prepend[x, s]] &) /@
>       Complement[gNext[s, t], x], 1]
>
> Length /@ (gtest = gMaximalPaths /@ gNodes) // Timing
>
> {0.097408, {1, 1, 1, 1, 2, 3, 4, 7, 10, 8, 7, 7, 7, 10, 15, 15, 32,
>    394, 355, 396, 863, 820, 1225, 891, 1211, 827, 11269, 13353, 15430}}
>
> Your code's timing was 0.15977 at this machine.
>
> Bobby
>
> On Tue, 29 Mar 2011 06:50:58 -0500, David Bevan <david.bevan at pb.com>  
> wrote:
>
>> Here's an improvement:
>>
>> gCanReach[g,t,x] gives a list of the nodes from which there is a path to
>> t avoiding nodes in list x.
>> gBetween[g,s,t] gives a list of the nodes that can be reached along a
>> path from s to t which does not revisit s.
>> gNext[g,s,t] gives a list of the nodes one step from s towards t.
>>
>> By using these, we avoid searching down "blind alleys".
>>
>>
>> ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach,gBetween]
>>
>> gNodes[g_]:=gNodes[g]=Union@Flatten[g/.Rule->List]
>>
>> gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]]
>>
>> gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g]
>>
>> gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]]
>>
>> gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}]
>>
>> gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}]
>>
>> gCanReach[g_,t_,x_]:=gCanReach[g,t,x]=FixedPoint[Complement[Union@@Join[{#},(gPrev[g,#]&)/@#],x]&,{t}]
>>
>> gBetween[g_,s_,t_]:=gBetween[g,s,t]=Intersection[gReachableFrom[g,s],gCanReach[g,t,{s}]]
>>
>> gNext[g_,s_,t_]:=gNext[g,s,t]=Intersection[gNext[g,s],gBetween[g,s,t]]
>>
>>
>>
>> ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3]
>>
>> gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1]
>>
>> gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}]
>>
>> gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]]
>>
>> gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}}
>>
>> gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x],1]
>>
>>
>>
>> gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing
>>
>> {0.203,47165}
>>
>>
>>
>> ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3]
>>
>> gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]]
>>
>> gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0]
>>
>> gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gBetween[g,s,t]]]
>>
>> gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1
>>
>> gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths2[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s,t],x]]
>>
>>
>>
>> Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing
>>
>> {0.093,47165}
>>
>>
>> This is now about 100 times faster than your original algorithm for the
>> sample data you provided.
>>
>> David %^>
>>
>>
>> ________________________________
>> From: David Bevan
>> Sent: 25 March 2011 21:35
>> To: mathgroup at smc.vnet.net; szhorvat at gmail.com
>> Subject: [mg117635] RE: [mg117576] Re: "set" data structure in
>> Mathematica? (speeding up
>>
>>
>>
>>
>> Your algorithm seems to be counting the number of distinct acyclic paths
>> from a specified node to any sink node (maximal paths below).
>>
>>
>>
>> The following seems to be faster than any solution proposed so far.
>>
>>
>>
>> gReachableFrom[g,s] gives a list of the nodes that can be reached along
>> a path from s.
>>
>> gCanReach[g,t] gives a list of the nodes from which there is a path to  
>> t.
>>
>>
>>
>> ClearAll[gNodes,gSinkNodes,gNext,gPrev,gReachableFrom,gCanReach]
>>
>> gNodes[g_]:=Union@Flatten[g/.Rule->List]
>>
>> gSinkNodes[g_]:=Complement[gNodes[g],Union[First/@g]]
>>
>> gNext[g_,s_]:=gNext[g,s]=ReplaceList[s,g]
>>
>> gPrev[g_,t_]:=gPrev[g,t]=ReplaceList[t,Reverse[g,2]]
>>
>> gReachableFrom[g_,s_]:=gReachableFrom[g,s]=FixedPoint[Union@@Join[{#},(gNext[g,#]&)/@#]&,{s}]
>>
>> gCanReach[g_,t_]:=gCanReach[g,t]=FixedPoint[Union@@Join[{#},(gPrev[g,#]&)/@#]&,{t}]
>>
>>
>>
>> gPaths[g,s,t,x] gives a list of the paths from s to t excluding nodes in
>> the list x.
>>
>> gPaths2 'normalises' x before calling gPaths3. This is the key to the
>> performance.
>>
>>
>>
>> ClearAll[gMaximalPaths,gPaths,gPaths2,gPaths3]
>>
>> gMaximalPaths[g_,s_]:=Flatten[(gPaths[g,s,#,{}]&)/@gSinkNodes[g],1]
>>
>> gPaths[g_,s_,t_,x_]:=gPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gPaths2[g,s,t,x],{}]
>>
>> gPaths2[g_,s_,t_,x_]:=gPaths2[g,s,t,x]=gPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]]
>>
>> gPaths3[g_,s_,s_,x_]:=gPaths3[g,s,s,x]={{s}}
>>
>> gPaths3[g_,s_,t_,x_]:=gPaths3[g,s,t,x]=(Prepend[#,s]&)/@Flatten[(gPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x],1]
>>
>>
>>
>> gMaximalPaths[gtest,#]&/@gNodes[gtest]//Flatten[#,1]&//Length//Timing
>>
>> {0.328,47165}
>>
>>
>>
>> This compares with about 2 seconds for DrMajorBob's algorithms on my PC.
>>
>>
>>
>> Counting without listing all the paths is a little faster.
>>
>>
>>
>> ClearAll[gNumMaximalPaths,gNumPaths,gNumPaths2,gNumPaths3]
>>
>> gNumMaximalPaths[g_,s_]:=Total[(gNumPaths[g,s,#,{}]&)/@gSinkNodes[g]]
>>
>> gNumPaths[g_,s_,t_,x_]:=gNumPaths[g,s,t,x]=If[MemberQ[gReachableFrom[g,s],t],gNumPaths2[g,s,t,x],0]
>>
>> gNumPaths2[g_,s_,t_,x_]:=gNumPaths2[g,s,t,x]=gNumPaths3[g,s,t,Intersection[x,gReachableFrom[g,s],gCanReach[g,t]]]
>>
>> gNumPaths3[g_,s_,s_,x_]:=gNumPaths3[g,s,s,x]=1
>>
>> gNumPaths3[g_,s_,t_,x_]:=gNumPaths3[g,s,t,x]=Total[(gNumPaths[g,#,t,Prepend[x,s]]&)/@Complement[gNext[g,s],x]]
>>
>>
>>
>> Total[gNumMaximalPaths[gtest,#]&/@gNodes[gtest]]//Timing
>>
>> {0.203,47165}
>>
>>
>>
>> Since all the intermeidate results are saved, a second run is much
>> faster.
>>
>>
>>
>> David %^>
>
>


-- 
DrMajorBob at yahoo.com


  • Prev by Date: an example... Re: Why Mathematica does not issue a warning when the calculations
  • Next by Date: Re: Why Mathematica does not issue a warning when the calculations
  • Previous by thread: Re: "set" data structure in Mathematica? (speeding up graph traversal function)
  • Next by thread: Copy to clipboard in tabular format?