       Some speedups

• To: mathgroup at smc.vnet.net
• Subject: [mg37292] Some speedups
• From: "Allan Hayes" <hay at haystack.demon.co.uk>
• Date: Tue, 22 Oct 2002 04:47:13 -0400 (EDT)
• References: <aookqg\$pkq\$1@smc.vnet.net>
• Sender: owner-wri-mathgroup at wolfram.com

```The notebook code below contains ways of speeding up some uses of  Table,
Sort and Split and avoiding building up inefficienlty large intermediate
expressions.
I put them together while looking into the various solutions in the thread
"grouping and averaging {x,y} pairs of data" -- they helped give what seems
to be the fastest uncompiled solution yet.

Allan

---------------------
Allan Hayes
Mathematica Training and Consulting
Leicester UK
www.haystack.demon.co.uk
hay at haystack.demon.co.uk
Voice: +44 (0)116 271 4198
Fax: +44 (0)870 164 0565

To make a notebook from the following, copy between the two lines +++++++,
past into a notebook, click "Yes" in the panel that pops up.

+++++++

Notebook[{

Cell[CellGroupData[{
Cell[TextData[{
"Speedups \n",
StyleBox["Table, Sort, Split",
FontSize->18]
}], "Title"],

Cell["Allan Hayes October 2002", "Subsubtitle"],

Cell[BoxData[
\(Off[General::"\<spell1\>", General::"\<spell\>"]\)], "Input",
InitializationCell->True],

Cell[CellGroupData[{

Cell["General Techniques", "Section"],

Cell[CellGroupData[{

Cell["Table", "Subsection"],

Cell[TextData[{
"The following illustrates a generally applicable point: that \
inserting large expressions directly using a function or ",
StyleBox["With",
FontFamily->"Courier"],
" can lead to inefficiently large expressions. This can be avoided \
by storing in a local variable."
}], "Text"],

Cell[BoxData[{
\(\(ranges = \ Table[{1, 200}, {10000}];\)\), "\n",
\(\(data\  = \ Range;\)\)}], "Input"],

Cell["This is quite slow.", "Text"],

Cell[BoxData[
\(takesX[data_, \
ranges_] := \[IndentingNewLine]\(Take[data, #] &\) /@
ranges\)], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
\(\(takesX[data, \ ranges];\) // Timing\)], "Input"],

Cell[BoxData[
\({3.789999999999992`\ Second, Null}\)], "Output"]
}, Closed]],

Cell[TextData[{
"The use of the local variable ",
StyleBox["dd",
FontFamily->"Courier"],
" below gives much quicker code"
}], "Text"],

Cell[BoxData[
\(takes[data_, \
ranges_] := \[IndentingNewLine]Module[{dd =
data}, \(Take[dd, #] &\) /@ ranges]\)], "Input",
InitializationCell->True,
FontColor->RGBColor[0, 0, 1]],

Cell[CellGroupData[{

Cell[BoxData[
\(\(takes[data, \ ranges];\) // Timing\)], "Input"],

Cell[BoxData[
\({1.4899999999999807`\ Second, Null}\)], "Output"]
}, Closed]],

Cell[TextData[{
"We might have expected that the direct insertion of ",
StyleBox["data",
FontFamily->"Courier"],
" into ",
StyleBox["Take[data,#]&",
FontFamily->"Courier New"],
" would be quicker, but it looks as if the very big expression that \
is built up when using this technique slows things down.\nIn the \
second form ",
StyleBox["dd",
FontFamily->"Courier"],
" is evaluated once as each expression\n\t\t",
StyleBox["Take[dd,#]&[{1,200}]",
FontFamily->"Courier New"],
"\nis evaluated."
}], "Text"],

Cell[TextData[{
"As an extra check, use ",
StyleBox["With",
FontFamily->"Courier"],
" to input directly:"
}], "Text"],

Cell[BoxData[
\(takesY[data_, \
ranges_] := \[IndentingNewLine]With[{dd =
data}, \(Take[dd, #] &\) /@ ranges]\)], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
\(\(takesY[data, \ ranges];\) // Timing\)], "Input"],

Cell[BoxData[
\({3.240000000000002`\ Second, Null}\)], "Output"]
}, Closed]]
}, Closed]],

Cell[CellGroupData[{

Cell["Sort", "Subsection"],

Cell[BoxData[{
\(\(k = 2;\)\), "\[IndentingNewLine]",
\(\(data = \((SeedRandom;
Table[Random[
Integer, {1, 100}], {100000}, {k}])\);\)\)}], "Input"],

Cell["Let's sort the data:", "Text"],

Cell[CellGroupData[{

Cell[BoxData[
\(\(Sort[data];\) // Timing\)], "Input"],

Cell[BoxData[
\({1.9299999999999997`\ Second, Null}\)], "Output"]
}, Closed]],

Cell["\<\
Sorting by the first coordinate might be expected to be quicker, but \
the following  slows the calculation down markedly!\
\>", "Text"],

Cell[CellGroupData[{

Cell[BoxData[
\(\(Sort[data, #1[\(\)] \[LessEqual] #2[\(\)] &];\) //
Timing\)], "Input"],

Cell[BoxData[
\({150.16`\ Second, Null}\)], "Output"]
}, Closed]],

Cell[TextData[{
"But using ",
StyleBox["Ordering",
FontFamily->"Courier"],
", as below will usually speed things up (the advantage increases \
with the size of k)."
}], "Text"],

Cell[BoxData[
\(sortn[dat_,
n_] := \[IndentingNewLine]dat[\([Ordering[
dat[\([All, n]\)]]]\)]\)], "Input",
InitializationCell->True,
FontColor->RGBColor[0, 0, 1]],

Cell[CellGroupData[{

Cell[BoxData[
\(\(sortn[data, 1];\) // Timing\)], "Input"],

Cell[BoxData[
\({1.4799999999999898`\ Second, Null}\)], "Output"]
}, Closed]]
}, Closed]],

Cell[CellGroupData[{

Cell["Split", "Subsection"],

Cell[BoxData[{
\(\(data = \((SeedRandom;
Table[Random[
Integer, {1, 100}], {100000}, {2}])\);\)\), "\n",
\(\(sd = \ Sort[data];\)\)}], "Input"],

Cell[TextData[{
"Split  ",
StyleBox["data",
FontFamily->"Courier"],
"  by the first coordinate"
}], "Text"],

Cell[CellGroupData[{

Cell[BoxData[
\(\(Split[sd, \ #1[\(\)] \[Equal] #2[\(\)] &];\) //
Timing\)], "Input"],

Cell[BoxData[
\({10.549999999999999`\ Second, Null}\)], "Output"]
}, Closed]],

Cell["The following (following Hartmut Wolf) is quicker ", "Text"],

Cell[BoxData[
\(splitnX[dat_,
n_] := \[IndentingNewLine]Module[{s, p,
tks}, \[IndentingNewLine]s = Split[dat[\([All, n]\)]];
p = FoldList[\ #1 + #2 &,
0, \ \ Length /@ s]; \[IndentingNewLine]\(Take[
dat, #] &\) /@
Transpose[{Drop[p, \(-1\)] + 1,
Rest[p]}]\[IndentingNewLine]]\)], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
\(\(splitnX[sd, 1];\) // Timing\)], "Input"],

Cell[BoxData[
\({4.939999999999998`\ Second, Null}\)], "Output"]
}, Closed]],

Cell[TextData[{
"And using the trick we found above when looking at",
StyleBox[" Table ",
FontFamily->"Courier New"],
"it gets still quicker"
}], "Text"],

Cell[BoxData[
\(splitn[dat_,
n_] := \[IndentingNewLine]Module[{s, p, tks,
dd}, \[IndentingNewLine]ss = \(s =
Split[dat[\([All, n]\)]]\);
p = FoldList[\ #1 + #2 &,
0, \ \ Length /@ s]; \[IndentingNewLine]dd = \
dat; \[IndentingNewLine]\(Take[dd, #] &\) /@
Transpose[{Drop[p, \(-1\)] + 1,
Rest[p]}]\[IndentingNewLine]]\)], "Input",
InitializationCell->True,
FontColor->RGBColor[0, 0, 1]],

Cell[CellGroupData[{

Cell[BoxData[
\(\(splitn[sd, 1];\) // Timing\)], "Input"],

Cell[BoxData[
\({2.09`\ Second, Null}\)], "Output"]
}, Closed]]
}, Closed]]
}, Closed]],

Cell[CellGroupData[{

Cell["Application to Binned Averages", "Section"],

Cell["\<\
Daniel Lichtblau's elegant compiled solution is quickest. The \
uncompiled version, 2.,  comes fairly close, but will probably fall \
further behind with very big inputs.\
\>", "Text"],

Cell[TextData[{
"To make the timings more comparable, each section below starts \
with ",
StyleBox["Quit",
FontFamily->"Courier New"],
". This must be evaluated separately before the rest of the code. \
You will be asked if you want to evaluate the initialization cells \
when continuing with rest of the code - click \"Yes\". "
}], "Text"],

Cell[CellGroupData[{

Cell["1. Hartmut Wolf", "Subsection",
CellDingbat->None,
CellMargins->{{10, Inherited}, {Inherited, Inherited}}],

Cell[BoxData[
\(Quit\)], "Input"],

Cell[BoxData[
\(\(data = \((SeedRandom;
Table[Random[
Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"],

Cell[BoxData[
\(\(\(binnedAverage1[data_, max_] :=
Module[{v, i, ix, ixx, ixxx}, {i, v} =
With[{rr = Range[max]},
Transpose[
Sort[Join[data,
Transpose[{rr,
rr - rr}]]]]]; \[IndentingNewLine]ix =
Split[i]; \[IndentingNewLine]ixx =
FoldList[Plus[#1, Length[#2]] &, 0,
ix]; \[IndentingNewLine]ixxx =
Transpose[
Transpose[Partition[ixx, 2, 1]] + {1,
0}]; \[IndentingNewLine]Transpose[{First /@
ix, \(\(\((Plus @@ #)\)/Max[Length[#] - 1, 1] &\)[
Take[v, #]] &\) /@ ixxx}]]\)\(\n\)
\)\)], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
\(Table[\(\(\((r1 = binnedAverage1[data, 100])\);\) // Timing\) //
First, {5}]\)], "Input"],

Cell[BoxData[
\({3.790000000000001`\ Second, 4.609999999999999`\ Second,
4.17`\ Second, 5.`\ Second,
4.010000000000002`\ Second}\)], "Output"]
}, Closed]]
}, Closed]],

Cell[CellGroupData[{

Cell["2. Allan Hayes", "Subsection",
CellDingbat->None,
CellMargins->{{10, Inherited}, {Inherited, Inherited}}],

Cell[TextData[{
"Note that ",
StyleBox["Tr[lst]",
FontFamily->"Courier"],
StyleBox[" is used ",
FontFamily->"Times New Roman"],
StyleBox["ins",
FontFamily->"Times New Roman"],
StyleBox["f ",
FontFamily->"Times New Roman"],
StyleBox["Plus@@lst",
FontFamily->"Courier"],
StyleBox[" ",
FontFamily->"Times New Roman"],
StyleBox["to sum",
FontFamily->"Times New Roman"],
StyleBox[" ",
FontFamily->"Times New Roman"],
StyleBox["lst",
FontFamily->"Courier"],
"."
}], "Text"],

Cell[BoxData[
\(Quit\)], "Input"],

Cell[BoxData[
\(\(data = \((SeedRandom;
Table[Random[
Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"],

Cell[BoxData[
\(binnedAverage2[data_, \
max_:  0] := \[IndentingNewLine]Module[{spl, avs, inds, dd,
zs}, \[IndentingNewLine]spl =
splitn[sortn[data, 1],
1]; \[IndentingNewLine]avs = \(\((Tr[#1[\([All, 2]\)]]/
Length[#1])\) &\) /@
spl; \[IndentingNewLine]inds = \
spl[\([All, 1, 1]\)]; \[IndentingNewLine]dd\  = \
Range[Max[Last[inds], max]]; \[IndentingNewLine]zs = \
dd - dd; \[IndentingNewLine]zs[\([inds]\)] = \
avs; \[IndentingNewLine]Transpose[{dd,
zs}]\[IndentingNewLine]]\)], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
\(Table[\(binnedAverage2[data] // Timing\) //
First, {5}]\)], "Input"],

Cell[BoxData[
\({2.8100000000000005`\ Second, 3.0700000000000003`\ Second,
2.799999999999999`\ Second, 2.799999999999999`\ Second,
2.75`\ Second}\)], "Output"]
}, Closed]]
}, Closed]],

Cell[CellGroupData[{

Cell["3. Daniel Lichtblau", "Subsection",
CellDingbat->None,
CellMargins->{{10, Inherited}, {Inherited, Inherited}}],

Cell[BoxData[
\(Quit\)], "Input"],

Cell[BoxData[
\(\(data = \((SeedRandom;
Table[Random[
Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"],

Cell[BoxData[
StyleBox[\(averageByBin3[
data : {{_, _} .. }]\  := \ \[IndentingNewLine]Module[\n\
{res = Transpose[data]}, \n
res\  = \ averagebyBinC[res[\(\)], res[\(\)]]; \n
Transpose[{Range[Length[res]], res}]\n]\),
FormatType->StandardForm]], "Input"],

Cell[BoxData[
StyleBox[\(\(\n\)\(\(averagebyBinC\  = \
Compile[{{xvals, _Integer, 1}, {yvals, _Real, 1}}, \n
Module[{len = Max[xvals], \ binsizes, \ averages,
len2 = Length[xvals], \ indx}, \n
binsizes\  = \ Table[0, {len}]; \n
averages\  = \ Table[0. , {len}]; \n
Do\ [\nindx\  = \
xvals[\([j]\)]; \n\(binsizes[\([indx]\)]++\); \n
averages[\([indx]\)]\  += \ yvals[\([j]\)], \n{j,
len2}\n]; \n
binsizes\  = \ Map[If\ [# == 0, 1, #] &, \ binsizes]; \n
averages\ /\ binsizes\n]\n];\)\(\n\)
\)\),
FormatType->StandardForm]], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
\(Table[\(\(averageByBin3[data];\) // Timing\) //
First, {5}]\)], "Input"],

Cell[BoxData[
\({2.3099999999999996`\ Second, 2.1400000000000006`\ Second,
2.3599999999999994`\ Second, 2.3100000000000005`\ Second,
2.3599999999999994`\ Second}\)], "Output"]
}, Closed]]
}, Closed]]
}, Closed]]
}, Open  ]]
},
FrontEndVersion->"4.2 for Microsoft Windows",
ScreenRectangle->{{0, 1024}, {0, 723}},
AutoGeneratedPackage->None,
WindowToolbars->"RulerBar",
WindowSize->{459, 310},
WindowMargins->{{250, Automatic}, {Automatic, -20}},
ShowSelection->True
]

+++++++

```

• Prev by Date: Comparison of Mathematica on Various Computers
• Next by Date: Getting symbols out of list returned by solve
• Previous by thread: Mathematica 4.0 on Windows 98
• Next by thread: Getting symbols out of list returned by solve