Re: Some speedups
- To: mathgroup at smc.vnet.net
- Subject: [mg37319] Re: [mg37292] Some speedups
- From: Andrzej Kozlowski <andrzej at platon.c.u-tokyo.ac.jp>
- Date: Wed, 23 Oct 2002 02:56:56 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
Allan's posting made me realize a problem that affects all users of Mathematica and Apple's Mail.app for MacOS X. If you use Mail.app and carry out Allan's instruction below you will not get the expected result. The reason is that Mail.app inserts an extra character into the text when pasting into a Carbon application (like the Mathematica Front End). This ruins Mathematica's ability to interpret the text as a Mathematica expression. One way around this problem is to paste the expression first into TextEdit and then copy it form TextEdit and paste into Mathematica. I think this is a good place make a plea to Wolfram Research to seriously consider a Cocoa FrontEnd for MacOS X Mathematica. It is in fact somewhat surprising that it has not been done yet, given the advantages of Cocoa and the fact that it is essentially NextStep, with which WRI has a longer experience than most software developers. Andrzej Kozlowski Yokohama, Japan http://www.mimuw.edu.pl/~akoz/ http://platon.c.u-tokyo.ac.jp/andrzej/ On Tuesday, October 22, 2002, at 05:47 PM, Allan Hayes wrote: > The notebook code below contains ways of speeding up some uses of > Table, > Sort and Split and avoiding building up inefficienlty large > intermediate > expressions. > I put them together while looking into the various solutions in the > thread > "grouping and averaging {x,y} pairs of data" -- they helped give what > seems > to be the fastest uncompiled solution yet. > > Allan > > --------------------- > Allan Hayes > Mathematica Training and Consulting > Leicester UK > www.haystack.demon.co.uk > hay at haystack.demon.co.uk > Voice: +44 (0)116 271 4198 > Fax: +44 (0)870 164 0565 > > > To make a notebook from the following, copy between the two lines > +++++++, > past into a notebook, click "Yes" in the panel that pops up. > > +++++++ > > Notebook[{ > > Cell[CellGroupData[{ > Cell[TextData[{ > "Speedups \n", > StyleBox["Table, Sort, Split", > FontSize->18] > }], "Title"], > > Cell["Allan Hayes October 2002", "Subsubtitle"], > > Cell[BoxData[ > \(Off[General::"\<spell1\>", General::"\<spell\>"]\)], "Input", > InitializationCell->True], > > Cell[CellGroupData[{ > > Cell["General Techniques", "Section"], > > Cell[CellGroupData[{ > > Cell["Table", "Subsection"], > > Cell[TextData[{ > "The following illustrates a generally applicable point: that \ > inserting large expressions directly using a function or ", > StyleBox["With", > FontFamily->"Courier"], > " can lead to inefficiently large expressions. This can be avoided \ > by storing in a local variable." > }], "Text"], > > Cell[BoxData[{ > \(\(ranges = \ Table[{1, 200}, {10000}];\)\), "\n", > \(\(data\ = \ Range[100000];\)\)}], "Input"], > > Cell["This is quite slow.", "Text"], > > Cell[BoxData[ > \(takesX[data_, \ > ranges_] := \[IndentingNewLine]\(Take[data, #] &\) /@ > ranges\)], "Input"], > > Cell[CellGroupData[{ > > Cell[BoxData[ > \(\(takesX[data, \ ranges];\) // Timing\)], "Input"], > > Cell[BoxData[ > \({3.789999999999992`\ Second, Null}\)], "Output"] > }, Closed]], > > Cell[TextData[{ > "The use of the local variable ", > StyleBox["dd", > FontFamily->"Courier"], > " below gives much quicker code" > }], "Text"], > > Cell[BoxData[ > \(takes[data_, \ > ranges_] := \[IndentingNewLine]Module[{dd = > data}, \(Take[dd, #] &\) /@ ranges]\)], "Input", > InitializationCell->True, > FontColor->RGBColor[0, 0, 1]], > > Cell[CellGroupData[{ > > Cell[BoxData[ > \(\(takes[data, \ ranges];\) // Timing\)], "Input"], > > Cell[BoxData[ > \({1.4899999999999807`\ Second, Null}\)], "Output"] > }, Closed]], > > Cell[TextData[{ > "We might have expected that the direct insertion of ", > StyleBox["data", > FontFamily->"Courier"], > " into ", > StyleBox["Take[data,#]&", > FontFamily->"Courier New"], > " would be quicker, but it looks as if the very big expression that \ > is built up when using this technique slows things down.\nIn the \ > second form ", > StyleBox["dd", > FontFamily->"Courier"], > " is evaluated once as each expression\n\t\t", > StyleBox["Take[dd,#]&[{1,200}]", > FontFamily->"Courier New"], > "\nis evaluated." > }], "Text"], > > Cell[TextData[{ > "As an extra check, use ", > StyleBox["With", > FontFamily->"Courier"], > " to input directly:" > }], "Text"], > > Cell[BoxData[ > \(takesY[data_, \ > ranges_] := \[IndentingNewLine]With[{dd = > data}, \(Take[dd, #] &\) /@ ranges]\)], "Input"], > > Cell[CellGroupData[{ > > Cell[BoxData[ > \(\(takesY[data, \ ranges];\) // Timing\)], "Input"], > > Cell[BoxData[ > \({3.240000000000002`\ Second, Null}\)], "Output"] > }, Closed]] > }, Closed]], > > Cell[CellGroupData[{ > > Cell["Sort", "Subsection"], > > Cell[BoxData[{ > \(\(k = 2;\)\), "\[IndentingNewLine]", > \(\(data = \((SeedRandom[0]; > Table[Random[ > Integer, {1, 100}], {100000}, {k}])\);\)\)}], "Input"], > > Cell["Let's sort the data:", "Text"], > > Cell[CellGroupData[{ > > Cell[BoxData[ > \(\(Sort[data];\) // Timing\)], "Input"], > > Cell[BoxData[ > \({1.9299999999999997`\ Second, Null}\)], "Output"] > }, Closed]], > > Cell["\<\ > Sorting by the first coordinate might be expected to be quicker, but \ > the following slows the calculation down markedly!\ > \>", "Text"], > > Cell[CellGroupData[{ > > Cell[BoxData[ > \(\(Sort[data, #1[\([1]\)] \[LessEqual] #2[\([1]\)] &];\) // > Timing\)], "Input"], > > Cell[BoxData[ > \({150.16`\ Second, Null}\)], "Output"] > }, Closed]], > > Cell[TextData[{ > "But using ", > StyleBox["Ordering", > FontFamily->"Courier"], > ", as below will usually speed things up (the advantage increases \ > with the size of k)." > }], "Text"], > > Cell[BoxData[ > \(sortn[dat_, > n_] := \[IndentingNewLine]dat[\([Ordering[ > dat[\([All, n]\)]]]\)]\)], "Input", > InitializationCell->True, > FontColor->RGBColor[0, 0, 1]], > > Cell[CellGroupData[{ > > Cell[BoxData[ > \(\(sortn[data, 1];\) // Timing\)], "Input"], > > Cell[BoxData[ > \({1.4799999999999898`\ Second, Null}\)], "Output"] > }, Closed]] > }, Closed]], > > Cell[CellGroupData[{ > > Cell["Split", "Subsection"], > > Cell[BoxData[{ > \(\(data = \((SeedRandom[0]; > Table[Random[ > Integer, {1, 100}], {100000}, {2}])\);\)\), "\n", > \(\(sd = \ Sort[data];\)\)}], "Input"], > > Cell[TextData[{ > "Split ", > StyleBox["data", > FontFamily->"Courier"], > " by the first coordinate" > }], "Text"], > > Cell[CellGroupData[{ > > Cell[BoxData[ > \(\(Split[sd, \ #1[\([1]\)] \[Equal] #2[\([1]\)] &];\) // > Timing\)], "Input"], > > Cell[BoxData[ > \({10.549999999999999`\ Second, Null}\)], "Output"] > }, Closed]], > > Cell["The following (following Hartmut Wolf) is quicker ", "Text"], > > Cell[BoxData[ > \(splitnX[dat_, > n_] := \[IndentingNewLine]Module[{s, p, > tks}, \[IndentingNewLine]s = Split[dat[\([All, n]\)]]; > p = FoldList[\ #1 + #2 &, > 0, \ \ Length /@ s]; \[IndentingNewLine]\(Take[ > dat, #] &\) /@ > Transpose[{Drop[p, \(-1\)] + 1, > Rest[p]}]\[IndentingNewLine]]\)], "Input"], > > Cell[CellGroupData[{ > > Cell[BoxData[ > \(\(splitnX[sd, 1];\) // Timing\)], "Input"], > > Cell[BoxData[ > \({4.939999999999998`\ Second, Null}\)], "Output"] > }, Closed]], > > Cell[TextData[{ > "And using the trick we found above when looking at", > StyleBox[" Table ", > FontFamily->"Courier New"], > "it gets still quicker" > }], "Text"], > > Cell[BoxData[ > \(splitn[dat_, > n_] := \[IndentingNewLine]Module[{s, p, tks, > dd}, \[IndentingNewLine]ss = \(s = > Split[dat[\([All, n]\)]]\); > p = FoldList[\ #1 + #2 &, > 0, \ \ Length /@ s]; \[IndentingNewLine]dd = \ > dat; \[IndentingNewLine]\(Take[dd, #] &\) /@ > Transpose[{Drop[p, \(-1\)] + 1, > Rest[p]}]\[IndentingNewLine]]\)], "Input", > InitializationCell->True, > FontColor->RGBColor[0, 0, 1]], > > Cell[CellGroupData[{ > > Cell[BoxData[ > \(\(splitn[sd, 1];\) // Timing\)], "Input"], > > Cell[BoxData[ > \({2.09`\ Second, Null}\)], "Output"] > }, Closed]] > }, Closed]] > }, Closed]], > > Cell[CellGroupData[{ > > Cell["Application to Binned Averages", "Section"], > > Cell["\<\ > Daniel Lichtblau's elegant compiled solution is quickest. The \ > uncompiled version, 2., comes fairly close, but will probably fall \ > further behind with very big inputs.\ > \>", "Text"], > > Cell[TextData[{ > "To make the timings more comparable, each section below starts \ > with ", > StyleBox["Quit", > FontFamily->"Courier New"], > ". This must be evaluated separately before the rest of the code. \ > You will be asked if you want to evaluate the initialization cells \ > when continuing with rest of the code - click \"Yes\". " > }], "Text"], > > Cell[CellGroupData[{ > > Cell["1. Hartmut Wolf", "Subsection", > CellDingbat->None, > CellMargins->{{10, Inherited}, {Inherited, Inherited}}], > > Cell[BoxData[ > \(Quit\)], "Input"], > > Cell[BoxData[ > \(\(data = \((SeedRandom[0]; > Table[Random[ > Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"], > > Cell[BoxData[ > \(\(\(binnedAverage1[data_, max_] := > Module[{v, i, ix, ixx, ixxx}, {i, v} = > With[{rr = Range[max]}, > Transpose[ > Sort[Join[data, > Transpose[{rr, > rr - rr}]]]]]; \[IndentingNewLine]ix = > Split[i]; \[IndentingNewLine]ixx = > FoldList[Plus[#1, Length[#2]] &, 0, > ix]; \[IndentingNewLine]ixxx = > Transpose[ > Transpose[Partition[ixx, 2, 1]] + {1, > 0}]; \[IndentingNewLine]Transpose[{First /@ > ix, \(\(\((Plus @@ #)\)/Max[Length[#] - 1, 1] &\)[ > Take[v, #]] &\) /@ ixxx}]]\)\(\n\) > \)\)], "Input"], > > Cell[CellGroupData[{ > > Cell[BoxData[ > \(Table[\(\(\((r1 = binnedAverage1[data, 100])\);\) // Timing\) // > First, {5}]\)], "Input"], > > Cell[BoxData[ > \({3.790000000000001`\ Second, 4.609999999999999`\ Second, > 4.17`\ Second, 5.`\ Second, > 4.010000000000002`\ Second}\)], "Output"] > }, Closed]] > }, Closed]], > > Cell[CellGroupData[{ > > Cell["2. Allan Hayes", "Subsection", > CellDingbat->None, > CellMargins->{{10, Inherited}, {Inherited, Inherited}}], > > Cell[TextData[{ > "Note that ", > StyleBox["Tr[lst]", > FontFamily->"Courier"], > StyleBox[" is used ", > FontFamily->"Times New Roman"], > StyleBox["ins", > FontFamily->"Times New Roman"], > "tead o", > StyleBox["f ", > FontFamily->"Times New Roman"], > StyleBox["Plus@@lst", > FontFamily->"Courier"], > StyleBox[" ", > FontFamily->"Times New Roman"], > StyleBox["to sum", > FontFamily->"Times New Roman"], > StyleBox[" ", > FontFamily->"Times New Roman"], > StyleBox["lst", > FontFamily->"Courier"], > "." > }], "Text"], > > Cell[BoxData[ > \(Quit\)], "Input"], > > Cell[BoxData[ > \(\(data = \((SeedRandom[0]; > Table[Random[ > Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"], > > Cell[BoxData[ > \(binnedAverage2[data_, \ > max_: 0] := \[IndentingNewLine]Module[{spl, avs, inds, dd, > zs}, \[IndentingNewLine]spl = > splitn[sortn[data, 1], > 1]; \[IndentingNewLine]avs = \(\((Tr[#1[\([All, 2]\)]]/ > Length[#1])\) &\) /@ > spl; \[IndentingNewLine]inds = \ > spl[\([All, 1, 1]\)]; \[IndentingNewLine]dd\ = \ > Range[Max[Last[inds], max]]; \[IndentingNewLine]zs = \ > dd - dd; \[IndentingNewLine]zs[\([inds]\)] = \ > avs; \[IndentingNewLine]Transpose[{dd, > zs}]\[IndentingNewLine]]\)], "Input"], > > Cell[CellGroupData[{ > > Cell[BoxData[ > \(Table[\(binnedAverage2[data] // Timing\) // > First, {5}]\)], "Input"], > > Cell[BoxData[ > \({2.8100000000000005`\ Second, 3.0700000000000003`\ Second, > 2.799999999999999`\ Second, 2.799999999999999`\ Second, > 2.75`\ Second}\)], "Output"] > }, Closed]] > }, Closed]], > > Cell[CellGroupData[{ > > Cell["3. Daniel Lichtblau", "Subsection", > CellDingbat->None, > CellMargins->{{10, Inherited}, {Inherited, Inherited}}], > > Cell[BoxData[ > \(Quit\)], "Input"], > > Cell[BoxData[ > \(\(data = \((SeedRandom[0]; > Table[Random[ > Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"], > > Cell[BoxData[ > StyleBox[\(averageByBin3[ > data : {{_, _} .. }]\ := \ \[IndentingNewLine]Module[\n\ > {res = Transpose[data]}, \n > res\ = \ averagebyBinC[res[\([1]\)], res[\([2]\)]]; \n > Transpose[{Range[Length[res]], res}]\n]\), > FormatType->StandardForm]], "Input"], > > Cell[BoxData[ > StyleBox[\(\(\n\)\(\(averagebyBinC\ = \ > Compile[{{xvals, _Integer, 1}, {yvals, _Real, 1}}, \n > Module[{len = Max[xvals], \ binsizes, \ averages, > len2 = Length[xvals], \ indx}, \n > binsizes\ = \ Table[0, {len}]; \n > averages\ = \ Table[0. , {len}]; \n > Do\ [\nindx\ = \ > xvals[\([j]\)]; \n\(binsizes[\([indx]\)]++\); \n > averages[\([indx]\)]\ += \ yvals[\([j]\)], \n{j, > len2}\n]; \n > binsizes\ = \ Map[If\ [# == 0, 1, #] &, \ binsizes]; \n > averages\ /\ binsizes\n]\n];\)\(\n\) > \)\), > FormatType->StandardForm]], "Input"], > > Cell[CellGroupData[{ > > Cell[BoxData[ > \(Table[\(\(averageByBin3[data];\) // Timing\) // > First, {5}]\)], "Input"], > > Cell[BoxData[ > \({2.3099999999999996`\ Second, 2.1400000000000006`\ Second, > 2.3599999999999994`\ Second, 2.3100000000000005`\ Second, > 2.3599999999999994`\ Second}\)], "Output"] > }, Closed]] > }, Closed]] > }, Closed]] > }, Open ]] > }, > FrontEndVersion->"4.2 for Microsoft Windows", > ScreenRectangle->{{0, 1024}, {0, 723}}, > AutoGeneratedPackage->None, > WindowToolbars->"RulerBar", > WindowSize->{459, 310}, > WindowMargins->{{250, Automatic}, {Automatic, -20}}, > ShowSelection->True > ] > > +++++++ > > > > > >