Some speedups
- To: mathgroup at smc.vnet.net
- Subject: [mg37292] Some speedups
- From: "Allan Hayes" <hay at haystack.demon.co.uk>
- Date: Tue, 22 Oct 2002 04:47:13 -0400 (EDT)
- References: <aookqg$pkq$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
The notebook code below contains ways of speeding up some uses of Table, Sort and Split and avoiding building up inefficienlty large intermediate expressions. I put them together while looking into the various solutions in the thread "grouping and averaging {x,y} pairs of data" -- they helped give what seems to be the fastest uncompiled solution yet. Allan --------------------- Allan Hayes Mathematica Training and Consulting Leicester UK www.haystack.demon.co.uk hay at haystack.demon.co.uk Voice: +44 (0)116 271 4198 Fax: +44 (0)870 164 0565 To make a notebook from the following, copy between the two lines +++++++, past into a notebook, click "Yes" in the panel that pops up. +++++++ Notebook[{ Cell[CellGroupData[{ Cell[TextData[{ "Speedups \n", StyleBox["Table, Sort, Split", FontSize->18] }], "Title"], Cell["Allan Hayes October 2002", "Subsubtitle"], Cell[BoxData[ \(Off[General::"\<spell1\>", General::"\<spell\>"]\)], "Input", InitializationCell->True], Cell[CellGroupData[{ Cell["General Techniques", "Section"], Cell[CellGroupData[{ Cell["Table", "Subsection"], Cell[TextData[{ "The following illustrates a generally applicable point: that \ inserting large expressions directly using a function or ", StyleBox["With", FontFamily->"Courier"], " can lead to inefficiently large expressions. This can be avoided \ by storing in a local variable." }], "Text"], Cell[BoxData[{ \(\(ranges = \ Table[{1, 200}, {10000}];\)\), "\n", \(\(data\ = \ Range[100000];\)\)}], "Input"], Cell["This is quite slow.", "Text"], Cell[BoxData[ \(takesX[data_, \ ranges_] := \[IndentingNewLine]\(Take[data, #] &\) /@ ranges\)], "Input"], Cell[CellGroupData[{ Cell[BoxData[ \(\(takesX[data, \ ranges];\) // Timing\)], "Input"], Cell[BoxData[ \({3.789999999999992`\ Second, Null}\)], "Output"] }, Closed]], Cell[TextData[{ "The use of the local variable ", StyleBox["dd", FontFamily->"Courier"], " below gives much quicker code" }], "Text"], Cell[BoxData[ \(takes[data_, \ ranges_] := \[IndentingNewLine]Module[{dd = data}, \(Take[dd, #] &\) /@ ranges]\)], "Input", InitializationCell->True, FontColor->RGBColor[0, 0, 1]], Cell[CellGroupData[{ Cell[BoxData[ \(\(takes[data, \ ranges];\) // Timing\)], "Input"], Cell[BoxData[ \({1.4899999999999807`\ Second, Null}\)], "Output"] }, Closed]], Cell[TextData[{ "We might have expected that the direct insertion of ", StyleBox["data", FontFamily->"Courier"], " into ", StyleBox["Take[data,#]&", FontFamily->"Courier New"], " would be quicker, but it looks as if the very big expression that \ is built up when using this technique slows things down.\nIn the \ second form ", StyleBox["dd", FontFamily->"Courier"], " is evaluated once as each expression\n\t\t", StyleBox["Take[dd,#]&[{1,200}]", FontFamily->"Courier New"], "\nis evaluated." }], "Text"], Cell[TextData[{ "As an extra check, use ", StyleBox["With", FontFamily->"Courier"], " to input directly:" }], "Text"], Cell[BoxData[ \(takesY[data_, \ ranges_] := \[IndentingNewLine]With[{dd = data}, \(Take[dd, #] &\) /@ ranges]\)], "Input"], Cell[CellGroupData[{ Cell[BoxData[ \(\(takesY[data, \ ranges];\) // Timing\)], "Input"], Cell[BoxData[ \({3.240000000000002`\ Second, Null}\)], "Output"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell["Sort", "Subsection"], Cell[BoxData[{ \(\(k = 2;\)\), "\[IndentingNewLine]", \(\(data = \((SeedRandom[0]; Table[Random[ Integer, {1, 100}], {100000}, {k}])\);\)\)}], "Input"], Cell["Let's sort the data:", "Text"], Cell[CellGroupData[{ Cell[BoxData[ \(\(Sort[data];\) // Timing\)], "Input"], Cell[BoxData[ \({1.9299999999999997`\ Second, Null}\)], "Output"] }, Closed]], Cell["\<\ Sorting by the first coordinate might be expected to be quicker, but \ the following slows the calculation down markedly!\ \>", "Text"], Cell[CellGroupData[{ Cell[BoxData[ \(\(Sort[data, #1[\([1]\)] \[LessEqual] #2[\([1]\)] &];\) // Timing\)], "Input"], Cell[BoxData[ \({150.16`\ Second, Null}\)], "Output"] }, Closed]], Cell[TextData[{ "But using ", StyleBox["Ordering", FontFamily->"Courier"], ", as below will usually speed things up (the advantage increases \ with the size of k)." }], "Text"], Cell[BoxData[ \(sortn[dat_, n_] := \[IndentingNewLine]dat[\([Ordering[ dat[\([All, n]\)]]]\)]\)], "Input", InitializationCell->True, FontColor->RGBColor[0, 0, 1]], Cell[CellGroupData[{ Cell[BoxData[ \(\(sortn[data, 1];\) // Timing\)], "Input"], Cell[BoxData[ \({1.4799999999999898`\ Second, Null}\)], "Output"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell["Split", "Subsection"], Cell[BoxData[{ \(\(data = \((SeedRandom[0]; Table[Random[ Integer, {1, 100}], {100000}, {2}])\);\)\), "\n", \(\(sd = \ Sort[data];\)\)}], "Input"], Cell[TextData[{ "Split ", StyleBox["data", FontFamily->"Courier"], " by the first coordinate" }], "Text"], Cell[CellGroupData[{ Cell[BoxData[ \(\(Split[sd, \ #1[\([1]\)] \[Equal] #2[\([1]\)] &];\) // Timing\)], "Input"], Cell[BoxData[ \({10.549999999999999`\ Second, Null}\)], "Output"] }, Closed]], Cell["The following (following Hartmut Wolf) is quicker ", "Text"], Cell[BoxData[ \(splitnX[dat_, n_] := \[IndentingNewLine]Module[{s, p, tks}, \[IndentingNewLine]s = Split[dat[\([All, n]\)]]; p = FoldList[\ #1 + #2 &, 0, \ \ Length /@ s]; \[IndentingNewLine]\(Take[ dat, #] &\) /@ Transpose[{Drop[p, \(-1\)] + 1, Rest[p]}]\[IndentingNewLine]]\)], "Input"], Cell[CellGroupData[{ Cell[BoxData[ \(\(splitnX[sd, 1];\) // Timing\)], "Input"], Cell[BoxData[ \({4.939999999999998`\ Second, Null}\)], "Output"] }, Closed]], Cell[TextData[{ "And using the trick we found above when looking at", StyleBox[" Table ", FontFamily->"Courier New"], "it gets still quicker" }], "Text"], Cell[BoxData[ \(splitn[dat_, n_] := \[IndentingNewLine]Module[{s, p, tks, dd}, \[IndentingNewLine]ss = \(s = Split[dat[\([All, n]\)]]\); p = FoldList[\ #1 + #2 &, 0, \ \ Length /@ s]; \[IndentingNewLine]dd = \ dat; \[IndentingNewLine]\(Take[dd, #] &\) /@ Transpose[{Drop[p, \(-1\)] + 1, Rest[p]}]\[IndentingNewLine]]\)], "Input", InitializationCell->True, FontColor->RGBColor[0, 0, 1]], Cell[CellGroupData[{ Cell[BoxData[ \(\(splitn[sd, 1];\) // Timing\)], "Input"], Cell[BoxData[ \({2.09`\ Second, Null}\)], "Output"] }, Closed]] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell["Application to Binned Averages", "Section"], Cell["\<\ Daniel Lichtblau's elegant compiled solution is quickest. The \ uncompiled version, 2., comes fairly close, but will probably fall \ further behind with very big inputs.\ \>", "Text"], Cell[TextData[{ "To make the timings more comparable, each section below starts \ with ", StyleBox["Quit", FontFamily->"Courier New"], ". This must be evaluated separately before the rest of the code. \ You will be asked if you want to evaluate the initialization cells \ when continuing with rest of the code - click \"Yes\". " }], "Text"], Cell[CellGroupData[{ Cell["1. Hartmut Wolf", "Subsection", CellDingbat->None, CellMargins->{{10, Inherited}, {Inherited, Inherited}}], Cell[BoxData[ \(Quit\)], "Input"], Cell[BoxData[ \(\(data = \((SeedRandom[0]; Table[Random[ Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"], Cell[BoxData[ \(\(\(binnedAverage1[data_, max_] := Module[{v, i, ix, ixx, ixxx}, {i, v} = With[{rr = Range[max]}, Transpose[ Sort[Join[data, Transpose[{rr, rr - rr}]]]]]; \[IndentingNewLine]ix = Split[i]; \[IndentingNewLine]ixx = FoldList[Plus[#1, Length[#2]] &, 0, ix]; \[IndentingNewLine]ixxx = Transpose[ Transpose[Partition[ixx, 2, 1]] + {1, 0}]; \[IndentingNewLine]Transpose[{First /@ ix, \(\(\((Plus @@ #)\)/Max[Length[#] - 1, 1] &\)[ Take[v, #]] &\) /@ ixxx}]]\)\(\n\) \)\)], "Input"], Cell[CellGroupData[{ Cell[BoxData[ \(Table[\(\(\((r1 = binnedAverage1[data, 100])\);\) // Timing\) // First, {5}]\)], "Input"], Cell[BoxData[ \({3.790000000000001`\ Second, 4.609999999999999`\ Second, 4.17`\ Second, 5.`\ Second, 4.010000000000002`\ Second}\)], "Output"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell["2. Allan Hayes", "Subsection", CellDingbat->None, CellMargins->{{10, Inherited}, {Inherited, Inherited}}], Cell[TextData[{ "Note that ", StyleBox["Tr[lst]", FontFamily->"Courier"], StyleBox[" is used ", FontFamily->"Times New Roman"], StyleBox["ins", FontFamily->"Times New Roman"], "tead o", StyleBox["f ", FontFamily->"Times New Roman"], StyleBox["Plus@@lst", FontFamily->"Courier"], StyleBox[" ", FontFamily->"Times New Roman"], StyleBox["to sum", FontFamily->"Times New Roman"], StyleBox[" ", FontFamily->"Times New Roman"], StyleBox["lst", FontFamily->"Courier"], "." }], "Text"], Cell[BoxData[ \(Quit\)], "Input"], Cell[BoxData[ \(\(data = \((SeedRandom[0]; Table[Random[ Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"], Cell[BoxData[ \(binnedAverage2[data_, \ max_: 0] := \[IndentingNewLine]Module[{spl, avs, inds, dd, zs}, \[IndentingNewLine]spl = splitn[sortn[data, 1], 1]; \[IndentingNewLine]avs = \(\((Tr[#1[\([All, 2]\)]]/ Length[#1])\) &\) /@ spl; \[IndentingNewLine]inds = \ spl[\([All, 1, 1]\)]; \[IndentingNewLine]dd\ = \ Range[Max[Last[inds], max]]; \[IndentingNewLine]zs = \ dd - dd; \[IndentingNewLine]zs[\([inds]\)] = \ avs; \[IndentingNewLine]Transpose[{dd, zs}]\[IndentingNewLine]]\)], "Input"], Cell[CellGroupData[{ Cell[BoxData[ \(Table[\(binnedAverage2[data] // Timing\) // First, {5}]\)], "Input"], Cell[BoxData[ \({2.8100000000000005`\ Second, 3.0700000000000003`\ Second, 2.799999999999999`\ Second, 2.799999999999999`\ Second, 2.75`\ Second}\)], "Output"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell["3. Daniel Lichtblau", "Subsection", CellDingbat->None, CellMargins->{{10, Inherited}, {Inherited, Inherited}}], Cell[BoxData[ \(Quit\)], "Input"], Cell[BoxData[ \(\(data = \((SeedRandom[0]; Table[Random[ Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"], Cell[BoxData[ StyleBox[\(averageByBin3[ data : {{_, _} .. }]\ := \ \[IndentingNewLine]Module[\n\ {res = Transpose[data]}, \n res\ = \ averagebyBinC[res[\([1]\)], res[\([2]\)]]; \n Transpose[{Range[Length[res]], res}]\n]\), FormatType->StandardForm]], "Input"], Cell[BoxData[ StyleBox[\(\(\n\)\(\(averagebyBinC\ = \ Compile[{{xvals, _Integer, 1}, {yvals, _Real, 1}}, \n Module[{len = Max[xvals], \ binsizes, \ averages, len2 = Length[xvals], \ indx}, \n binsizes\ = \ Table[0, {len}]; \n averages\ = \ Table[0. , {len}]; \n Do\ [\nindx\ = \ xvals[\([j]\)]; \n\(binsizes[\([indx]\)]++\); \n averages[\([indx]\)]\ += \ yvals[\([j]\)], \n{j, len2}\n]; \n binsizes\ = \ Map[If\ [# == 0, 1, #] &, \ binsizes]; \n averages\ /\ binsizes\n]\n];\)\(\n\) \)\), FormatType->StandardForm]], "Input"], Cell[CellGroupData[{ Cell[BoxData[ \(Table[\(\(averageByBin3[data];\) // Timing\) // First, {5}]\)], "Input"], Cell[BoxData[ \({2.3099999999999996`\ Second, 2.1400000000000006`\ Second, 2.3599999999999994`\ Second, 2.3100000000000005`\ Second, 2.3599999999999994`\ Second}\)], "Output"] }, Closed]] }, Closed]] }, Closed]] }, Open ]] }, FrontEndVersion->"4.2 for Microsoft Windows", ScreenRectangle->{{0, 1024}, {0, 723}}, AutoGeneratedPackage->None, WindowToolbars->"RulerBar", WindowSize->{459, 310}, WindowMargins->{{250, Automatic}, {Automatic, -20}}, ShowSelection->True ] +++++++