Re: Eliminating intermediate results
- To: mathgroup at smc.vnet.net
- Subject: [mg91550] Re: Eliminating intermediate results
- From: Mariano Suárez-Alvarez <mariano.suarezalvarez at gmail.com>
- Date: Wed, 27 Aug 2008 06:41:21 -0400 (EDT)
- References: <200808250905.FAA24382@smc.vnet.net> <g90bfc$n8o$1@smc.vnet.net>
On Aug 26, 4:30 am, Daniel Lichtblau <d... at wolfram.com> wrote: > Mariano Su=E1rez-Alvarez wrote: > > Hi, > > > Suppose you have a function 'generate' which given a parameter p > > produces a list of expressions, and you have a predicate predQ which > > looks at those expressions and does some test. To get the list of > > elements in the list which satisfy the predicate, one says: > > > Select[generate[p], predQ] > > > Now, if the list generated by 'generate[p]' is big with respect to the > > resulting filtered list, this tends to become impracticable soon. > > > For a concrete example: suppose you want to compute the length of > > the list of compositions of a number n in k parts with no zeros. One > > simpleminded way to do that is: > > > Timing@ Length@ Select[Compositions[25, 6], (! MemberQ[#, 0]) &] > > > which returns > > > {10.1025, 118755} > > > An alternative, considerably more clumsy way is to eliminate the > > intermediate list, and count. For example, using the very cool > > Sow/Reap pair of functions, one can do: > > > Timing@ Length@ Module[{n = 30, k = 6, bound}, > > bound = NumberOfCompositions[n, k]; > > Reap[ > > For[i = 0; p = Append[Table[0, {k - 1}], n], i < bound, = i++; > > p = NextComposition[p], > > If[! MemberQ[p, 0], Sow[p]] > > ] > > ][[2, 1]] > > ] > > > which in turns gives > > > {5.9171, 118755} > > > The difference here might very well be due to the different ways > > in which NextComposition and Compositions are impemented > > in Combinatorica (I haven't looked). > > I think this is the case. Mathematica is often better suited, in terms > of speed, for processing in an "all at once" fashion rather than "one at > a time". > > > But code like > > > Length@ Select[Compositions[25, 6], (! MemberQ[#, 0]) &] > > > which constructs huge intermediate data structures seems to > > be quite inefficient in Mathematica. > > I surmise you mean inefficient in memory usage rather than speed? Or is > speed here also an issue? In fact, this ends up being the difference between getting a result and not, since since huge _intermediate_ data can result in a OOM condition. > > Is there a standard approach to aleviate this issue? (preferably > > which does not involved ugly code as the one above, > > preserving the nice compositional style) > > I am not aware of one. You can improve speed by using just a counter > instead of Sow/Reap/Length, and by using Compile to rewrite > Combinatorica's NextComposition. Here is some code to do this. > > nextCompC = Compile[{{l, _Integer, 1}}, Module[{c = l, h = 1, t}, > While[c[[h]] == 0, h++]; > If[h == Length[l], > c = Join[{Last[c]}, Table[0, {Length[l] - 1}]] > , > t = c[[h]] - 1; > c[[{h, h + 1}]] = {0, c[[h + 1]] + 1}; > c[[1]] = t; > ]; > c > ]] > > Now you could do: > > Timing@Module[{n = 30, j = 0, k = 6, bound}, > bound = NumberOfCompositions[n, k]; > p = Append[Table[0, {k - 1}], n]; > For[i = 0, i < bound, i++, > p = nextCompC[p]; > If[! MemberQ[p, 0], j++]]; > j] > > I get a factor of 3 or so speed improvement in this way. Nothing great, > but neither is it useless. Notice my example was just an example. The context it came from needed not the length of the list but the combinations themselves. > > In the context of functional languages, one speaks of > > 'deforestation', > > which is the effort to eliminate intermediate trees (ie, data > > structures) > > which appear frequently when programming in a compositional > > style. The Haskell compiler, for example, iirc, is smart enough > > to *not* construct an intermediate list in the Haskell > > analogue of > > > Select[Range[1000], EvenQ] > > > Coming from Haskell, it be nice to be able to rely on > > Mathematica to be smart, so I don't have to ;) > > > -- m > > There might be ways to emulate this in Mathematica, but not using > Select. The semantics are that it will evaluate its first argument and > only then apply the predicate argument to cull from that first argument. Another way would be having something in the spirit of python' s 'yield' (Reap/Sow probably construct their lists in a similar way internally, I imagine, but Sow does not return control). I can't see how to implement such a thing, for one would need closures of some form, as well as magically informing all list-consuming functions, like Select,Map, Length and Count, so that they become able to consume they list arguments step by step, and/or product their list results step by step. SparseArray could possibly help, at least for lists for which one can do 'random access' (I don't think one knows how to do that for Compositions, for example, though) But things like Sum[i, {i, SparseArray[{i_ -> i^2}, {100000000}, 0]}] appear to turn the SparseArray into an actual list before summing, and Count[SparseArray[{3 -> 1}, {10000}, 0], 0] does not work (one has to convert the SparseArray into a list by hand) > One approach might be to generate the compositions in a tree-like > fashion, and prune branches for which all leaves (that is, individual > compositions) will of necessity contain zeros. Maybe set it up as a > recursion wherein we have > > compositions[n,k] = Join[Table[Prepend[compositions[n-j,k-1],j], {j,1,n= }] > > (I have not tried this and offer no guarantees as to how close or far it > is from workable.) Yup. Notice the example using compositions is just an example. What motivated my post was that in situations like in the example, one knows that the computation can be done is essentially constant space, yet one has to fight the language in order to do so :/ I wonder if in Bizarro World Mathematica is a fully lazy evaluator? :) -- m
- References:
- Eliminating intermediate results
- From: Mariano Suárez-Alvarez <mariano.suarezalvarez@gmail.com>
- Eliminating intermediate results