Some speedups
- To: mathgroup at smc.vnet.net
- Subject: [mg37292] Some speedups
- From: "Allan Hayes" <hay at haystack.demon.co.uk>
- Date: Tue, 22 Oct 2002 04:47:13 -0400 (EDT)
- References: <aookqg$pkq$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
The notebook code below contains ways of speeding up some uses of Table,
Sort and Split and avoiding building up inefficienlty large intermediate
expressions.
I put them together while looking into the various solutions in the thread
"grouping and averaging {x,y} pairs of data" -- they helped give what seems
to be the fastest uncompiled solution yet.
Allan
---------------------
Allan Hayes
Mathematica Training and Consulting
Leicester UK
www.haystack.demon.co.uk
hay at haystack.demon.co.uk
Voice: +44 (0)116 271 4198
Fax: +44 (0)870 164 0565
To make a notebook from the following, copy between the two lines +++++++,
past into a notebook, click "Yes" in the panel that pops up.
+++++++
Notebook[{
Cell[CellGroupData[{
Cell[TextData[{
"Speedups \n",
StyleBox["Table, Sort, Split",
FontSize->18]
}], "Title"],
Cell["Allan Hayes October 2002", "Subsubtitle"],
Cell[BoxData[
\(Off[General::"\<spell1\>", General::"\<spell\>"]\)], "Input",
InitializationCell->True],
Cell[CellGroupData[{
Cell["General Techniques", "Section"],
Cell[CellGroupData[{
Cell["Table", "Subsection"],
Cell[TextData[{
"The following illustrates a generally applicable point: that \
inserting large expressions directly using a function or ",
StyleBox["With",
FontFamily->"Courier"],
" can lead to inefficiently large expressions. This can be avoided \
by storing in a local variable."
}], "Text"],
Cell[BoxData[{
\(\(ranges = \ Table[{1, 200}, {10000}];\)\), "\n",
\(\(data\ = \ Range[100000];\)\)}], "Input"],
Cell["This is quite slow.", "Text"],
Cell[BoxData[
\(takesX[data_, \
ranges_] := \[IndentingNewLine]\(Take[data, #] &\) /@
ranges\)], "Input"],
Cell[CellGroupData[{
Cell[BoxData[
\(\(takesX[data, \ ranges];\) // Timing\)], "Input"],
Cell[BoxData[
\({3.789999999999992`\ Second, Null}\)], "Output"]
}, Closed]],
Cell[TextData[{
"The use of the local variable ",
StyleBox["dd",
FontFamily->"Courier"],
" below gives much quicker code"
}], "Text"],
Cell[BoxData[
\(takes[data_, \
ranges_] := \[IndentingNewLine]Module[{dd =
data}, \(Take[dd, #] &\) /@ ranges]\)], "Input",
InitializationCell->True,
FontColor->RGBColor[0, 0, 1]],
Cell[CellGroupData[{
Cell[BoxData[
\(\(takes[data, \ ranges];\) // Timing\)], "Input"],
Cell[BoxData[
\({1.4899999999999807`\ Second, Null}\)], "Output"]
}, Closed]],
Cell[TextData[{
"We might have expected that the direct insertion of ",
StyleBox["data",
FontFamily->"Courier"],
" into ",
StyleBox["Take[data,#]&",
FontFamily->"Courier New"],
" would be quicker, but it looks as if the very big expression that \
is built up when using this technique slows things down.\nIn the \
second form ",
StyleBox["dd",
FontFamily->"Courier"],
" is evaluated once as each expression\n\t\t",
StyleBox["Take[dd,#]&[{1,200}]",
FontFamily->"Courier New"],
"\nis evaluated."
}], "Text"],
Cell[TextData[{
"As an extra check, use ",
StyleBox["With",
FontFamily->"Courier"],
" to input directly:"
}], "Text"],
Cell[BoxData[
\(takesY[data_, \
ranges_] := \[IndentingNewLine]With[{dd =
data}, \(Take[dd, #] &\) /@ ranges]\)], "Input"],
Cell[CellGroupData[{
Cell[BoxData[
\(\(takesY[data, \ ranges];\) // Timing\)], "Input"],
Cell[BoxData[
\({3.240000000000002`\ Second, Null}\)], "Output"]
}, Closed]]
}, Closed]],
Cell[CellGroupData[{
Cell["Sort", "Subsection"],
Cell[BoxData[{
\(\(k = 2;\)\), "\[IndentingNewLine]",
\(\(data = \((SeedRandom[0];
Table[Random[
Integer, {1, 100}], {100000}, {k}])\);\)\)}], "Input"],
Cell["Let's sort the data:", "Text"],
Cell[CellGroupData[{
Cell[BoxData[
\(\(Sort[data];\) // Timing\)], "Input"],
Cell[BoxData[
\({1.9299999999999997`\ Second, Null}\)], "Output"]
}, Closed]],
Cell["\<\
Sorting by the first coordinate might be expected to be quicker, but \
the following slows the calculation down markedly!\
\>", "Text"],
Cell[CellGroupData[{
Cell[BoxData[
\(\(Sort[data, #1[\([1]\)] \[LessEqual] #2[\([1]\)] &];\) //
Timing\)], "Input"],
Cell[BoxData[
\({150.16`\ Second, Null}\)], "Output"]
}, Closed]],
Cell[TextData[{
"But using ",
StyleBox["Ordering",
FontFamily->"Courier"],
", as below will usually speed things up (the advantage increases \
with the size of k)."
}], "Text"],
Cell[BoxData[
\(sortn[dat_,
n_] := \[IndentingNewLine]dat[\([Ordering[
dat[\([All, n]\)]]]\)]\)], "Input",
InitializationCell->True,
FontColor->RGBColor[0, 0, 1]],
Cell[CellGroupData[{
Cell[BoxData[
\(\(sortn[data, 1];\) // Timing\)], "Input"],
Cell[BoxData[
\({1.4799999999999898`\ Second, Null}\)], "Output"]
}, Closed]]
}, Closed]],
Cell[CellGroupData[{
Cell["Split", "Subsection"],
Cell[BoxData[{
\(\(data = \((SeedRandom[0];
Table[Random[
Integer, {1, 100}], {100000}, {2}])\);\)\), "\n",
\(\(sd = \ Sort[data];\)\)}], "Input"],
Cell[TextData[{
"Split ",
StyleBox["data",
FontFamily->"Courier"],
" by the first coordinate"
}], "Text"],
Cell[CellGroupData[{
Cell[BoxData[
\(\(Split[sd, \ #1[\([1]\)] \[Equal] #2[\([1]\)] &];\) //
Timing\)], "Input"],
Cell[BoxData[
\({10.549999999999999`\ Second, Null}\)], "Output"]
}, Closed]],
Cell["The following (following Hartmut Wolf) is quicker ", "Text"],
Cell[BoxData[
\(splitnX[dat_,
n_] := \[IndentingNewLine]Module[{s, p,
tks}, \[IndentingNewLine]s = Split[dat[\([All, n]\)]];
p = FoldList[\ #1 + #2 &,
0, \ \ Length /@ s]; \[IndentingNewLine]\(Take[
dat, #] &\) /@
Transpose[{Drop[p, \(-1\)] + 1,
Rest[p]}]\[IndentingNewLine]]\)], "Input"],
Cell[CellGroupData[{
Cell[BoxData[
\(\(splitnX[sd, 1];\) // Timing\)], "Input"],
Cell[BoxData[
\({4.939999999999998`\ Second, Null}\)], "Output"]
}, Closed]],
Cell[TextData[{
"And using the trick we found above when looking at",
StyleBox[" Table ",
FontFamily->"Courier New"],
"it gets still quicker"
}], "Text"],
Cell[BoxData[
\(splitn[dat_,
n_] := \[IndentingNewLine]Module[{s, p, tks,
dd}, \[IndentingNewLine]ss = \(s =
Split[dat[\([All, n]\)]]\);
p = FoldList[\ #1 + #2 &,
0, \ \ Length /@ s]; \[IndentingNewLine]dd = \
dat; \[IndentingNewLine]\(Take[dd, #] &\) /@
Transpose[{Drop[p, \(-1\)] + 1,
Rest[p]}]\[IndentingNewLine]]\)], "Input",
InitializationCell->True,
FontColor->RGBColor[0, 0, 1]],
Cell[CellGroupData[{
Cell[BoxData[
\(\(splitn[sd, 1];\) // Timing\)], "Input"],
Cell[BoxData[
\({2.09`\ Second, Null}\)], "Output"]
}, Closed]]
}, Closed]]
}, Closed]],
Cell[CellGroupData[{
Cell["Application to Binned Averages", "Section"],
Cell["\<\
Daniel Lichtblau's elegant compiled solution is quickest. The \
uncompiled version, 2., comes fairly close, but will probably fall \
further behind with very big inputs.\
\>", "Text"],
Cell[TextData[{
"To make the timings more comparable, each section below starts \
with ",
StyleBox["Quit",
FontFamily->"Courier New"],
". This must be evaluated separately before the rest of the code. \
You will be asked if you want to evaluate the initialization cells \
when continuing with rest of the code - click \"Yes\". "
}], "Text"],
Cell[CellGroupData[{
Cell["1. Hartmut Wolf", "Subsection",
CellDingbat->None,
CellMargins->{{10, Inherited}, {Inherited, Inherited}}],
Cell[BoxData[
\(Quit\)], "Input"],
Cell[BoxData[
\(\(data = \((SeedRandom[0];
Table[Random[
Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"],
Cell[BoxData[
\(\(\(binnedAverage1[data_, max_] :=
Module[{v, i, ix, ixx, ixxx}, {i, v} =
With[{rr = Range[max]},
Transpose[
Sort[Join[data,
Transpose[{rr,
rr - rr}]]]]]; \[IndentingNewLine]ix =
Split[i]; \[IndentingNewLine]ixx =
FoldList[Plus[#1, Length[#2]] &, 0,
ix]; \[IndentingNewLine]ixxx =
Transpose[
Transpose[Partition[ixx, 2, 1]] + {1,
0}]; \[IndentingNewLine]Transpose[{First /@
ix, \(\(\((Plus @@ #)\)/Max[Length[#] - 1, 1] &\)[
Take[v, #]] &\) /@ ixxx}]]\)\(\n\)
\)\)], "Input"],
Cell[CellGroupData[{
Cell[BoxData[
\(Table[\(\(\((r1 = binnedAverage1[data, 100])\);\) // Timing\) //
First, {5}]\)], "Input"],
Cell[BoxData[
\({3.790000000000001`\ Second, 4.609999999999999`\ Second,
4.17`\ Second, 5.`\ Second,
4.010000000000002`\ Second}\)], "Output"]
}, Closed]]
}, Closed]],
Cell[CellGroupData[{
Cell["2. Allan Hayes", "Subsection",
CellDingbat->None,
CellMargins->{{10, Inherited}, {Inherited, Inherited}}],
Cell[TextData[{
"Note that ",
StyleBox["Tr[lst]",
FontFamily->"Courier"],
StyleBox[" is used ",
FontFamily->"Times New Roman"],
StyleBox["ins",
FontFamily->"Times New Roman"],
"tead o",
StyleBox["f ",
FontFamily->"Times New Roman"],
StyleBox["Plus@@lst",
FontFamily->"Courier"],
StyleBox[" ",
FontFamily->"Times New Roman"],
StyleBox["to sum",
FontFamily->"Times New Roman"],
StyleBox[" ",
FontFamily->"Times New Roman"],
StyleBox["lst",
FontFamily->"Courier"],
"."
}], "Text"],
Cell[BoxData[
\(Quit\)], "Input"],
Cell[BoxData[
\(\(data = \((SeedRandom[0];
Table[Random[
Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"],
Cell[BoxData[
\(binnedAverage2[data_, \
max_: 0] := \[IndentingNewLine]Module[{spl, avs, inds, dd,
zs}, \[IndentingNewLine]spl =
splitn[sortn[data, 1],
1]; \[IndentingNewLine]avs = \(\((Tr[#1[\([All, 2]\)]]/
Length[#1])\) &\) /@
spl; \[IndentingNewLine]inds = \
spl[\([All, 1, 1]\)]; \[IndentingNewLine]dd\ = \
Range[Max[Last[inds], max]]; \[IndentingNewLine]zs = \
dd - dd; \[IndentingNewLine]zs[\([inds]\)] = \
avs; \[IndentingNewLine]Transpose[{dd,
zs}]\[IndentingNewLine]]\)], "Input"],
Cell[CellGroupData[{
Cell[BoxData[
\(Table[\(binnedAverage2[data] // Timing\) //
First, {5}]\)], "Input"],
Cell[BoxData[
\({2.8100000000000005`\ Second, 3.0700000000000003`\ Second,
2.799999999999999`\ Second, 2.799999999999999`\ Second,
2.75`\ Second}\)], "Output"]
}, Closed]]
}, Closed]],
Cell[CellGroupData[{
Cell["3. Daniel Lichtblau", "Subsection",
CellDingbat->None,
CellMargins->{{10, Inherited}, {Inherited, Inherited}}],
Cell[BoxData[
\(Quit\)], "Input"],
Cell[BoxData[
\(\(data = \((SeedRandom[0];
Table[Random[
Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"],
Cell[BoxData[
StyleBox[\(averageByBin3[
data : {{_, _} .. }]\ := \ \[IndentingNewLine]Module[\n\
{res = Transpose[data]}, \n
res\ = \ averagebyBinC[res[\([1]\)], res[\([2]\)]]; \n
Transpose[{Range[Length[res]], res}]\n]\),
FormatType->StandardForm]], "Input"],
Cell[BoxData[
StyleBox[\(\(\n\)\(\(averagebyBinC\ = \
Compile[{{xvals, _Integer, 1}, {yvals, _Real, 1}}, \n
Module[{len = Max[xvals], \ binsizes, \ averages,
len2 = Length[xvals], \ indx}, \n
binsizes\ = \ Table[0, {len}]; \n
averages\ = \ Table[0. , {len}]; \n
Do\ [\nindx\ = \
xvals[\([j]\)]; \n\(binsizes[\([indx]\)]++\); \n
averages[\([indx]\)]\ += \ yvals[\([j]\)], \n{j,
len2}\n]; \n
binsizes\ = \ Map[If\ [# == 0, 1, #] &, \ binsizes]; \n
averages\ /\ binsizes\n]\n];\)\(\n\)
\)\),
FormatType->StandardForm]], "Input"],
Cell[CellGroupData[{
Cell[BoxData[
\(Table[\(\(averageByBin3[data];\) // Timing\) //
First, {5}]\)], "Input"],
Cell[BoxData[
\({2.3099999999999996`\ Second, 2.1400000000000006`\ Second,
2.3599999999999994`\ Second, 2.3100000000000005`\ Second,
2.3599999999999994`\ Second}\)], "Output"]
}, Closed]]
}, Closed]]
}, Closed]]
}, Open ]]
},
FrontEndVersion->"4.2 for Microsoft Windows",
ScreenRectangle->{{0, 1024}, {0, 723}},
AutoGeneratedPackage->None,
WindowToolbars->"RulerBar",
WindowSize->{459, 310},
WindowMargins->{{250, Automatic}, {Automatic, -20}},
ShowSelection->True
]
+++++++