MathGroup Archive 2002

[Date Index] [Thread Index] [Author Index]

Search the Archive

Some speedups

  • To: mathgroup at smc.vnet.net
  • Subject: [mg37292] Some speedups
  • From: "Allan Hayes" <hay at haystack.demon.co.uk>
  • Date: Tue, 22 Oct 2002 04:47:13 -0400 (EDT)
  • References: <aookqg$pkq$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

The notebook code below contains ways of speeding up some uses of  Table,
Sort and Split and avoiding building up inefficienlty large intermediate
expressions.
I put them together while looking into the various solutions in the thread
"grouping and averaging {x,y} pairs of data" -- they helped give what seems
to be the fastest uncompiled solution yet.

Allan

---------------------
Allan Hayes
Mathematica Training and Consulting
Leicester UK
www.haystack.demon.co.uk
hay at haystack.demon.co.uk
Voice: +44 (0)116 271 4198
Fax: +44 (0)870 164 0565


To make a notebook from the following, copy between the two lines +++++++,
past into a notebook, click "Yes" in the panel that pops up.

+++++++

Notebook[{

Cell[CellGroupData[{
Cell[TextData[{
  "Speedups \n",
  StyleBox["Table, Sort, Split",
    FontSize->18]
}], "Title"],

Cell["Allan Hayes October 2002", "Subsubtitle"],

Cell[BoxData[
    \(Off[General::"\<spell1\>", General::"\<spell\>"]\)], "Input",
  InitializationCell->True],

Cell[CellGroupData[{

Cell["General Techniques", "Section"],

Cell[CellGroupData[{

Cell["Table", "Subsection"],

Cell[TextData[{
  "The following illustrates a generally applicable point: that \
inserting large expressions directly using a function or ",
  StyleBox["With",
    FontFamily->"Courier"],
  " can lead to inefficiently large expressions. This can be avoided \
by storing in a local variable."
}], "Text"],

Cell[BoxData[{
    \(\(ranges = \ Table[{1, 200}, {10000}];\)\), "\n",
    \(\(data\  = \ Range[100000];\)\)}], "Input"],

Cell["This is quite slow.", "Text"],

Cell[BoxData[
    \(takesX[data_, \
        ranges_] := \[IndentingNewLine]\(Take[data, #] &\) /@
        ranges\)], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
    \(\(takesX[data, \ ranges];\) // Timing\)], "Input"],

Cell[BoxData[
    \({3.789999999999992`\ Second, Null}\)], "Output"]
}, Closed]],

Cell[TextData[{
  "The use of the local variable ",
  StyleBox["dd",
    FontFamily->"Courier"],
  " below gives much quicker code"
}], "Text"],

Cell[BoxData[
    \(takes[data_, \
        ranges_] := \[IndentingNewLine]Module[{dd =
            data}, \(Take[dd, #] &\) /@ ranges]\)], "Input",
  InitializationCell->True,
  FontColor->RGBColor[0, 0, 1]],

Cell[CellGroupData[{

Cell[BoxData[
    \(\(takes[data, \ ranges];\) // Timing\)], "Input"],

Cell[BoxData[
    \({1.4899999999999807`\ Second, Null}\)], "Output"]
}, Closed]],

Cell[TextData[{
  "We might have expected that the direct insertion of ",
  StyleBox["data",
    FontFamily->"Courier"],
  " into ",
  StyleBox["Take[data,#]&",
    FontFamily->"Courier New"],
  " would be quicker, but it looks as if the very big expression that \
is built up when using this technique slows things down.\nIn the \
second form ",
  StyleBox["dd",
    FontFamily->"Courier"],
  " is evaluated once as each expression\n\t\t",
  StyleBox["Take[dd,#]&[{1,200}]",
    FontFamily->"Courier New"],
  "\nis evaluated."
}], "Text"],

Cell[TextData[{
  "As an extra check, use ",
  StyleBox["With",
    FontFamily->"Courier"],
  " to input directly:"
}], "Text"],

Cell[BoxData[
    \(takesY[data_, \
        ranges_] := \[IndentingNewLine]With[{dd =
            data}, \(Take[dd, #] &\) /@ ranges]\)], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
    \(\(takesY[data, \ ranges];\) // Timing\)], "Input"],

Cell[BoxData[
    \({3.240000000000002`\ Second, Null}\)], "Output"]
}, Closed]]
}, Closed]],

Cell[CellGroupData[{

Cell["Sort", "Subsection"],

Cell[BoxData[{
    \(\(k = 2;\)\), "\[IndentingNewLine]",
    \(\(data = \((SeedRandom[0];
          Table[Random[
              Integer, {1, 100}], {100000}, {k}])\);\)\)}], "Input"],

Cell["Let's sort the data:", "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(\(Sort[data];\) // Timing\)], "Input"],

Cell[BoxData[
    \({1.9299999999999997`\ Second, Null}\)], "Output"]
}, Closed]],

Cell["\<\
Sorting by the first coordinate might be expected to be quicker, but \
the following  slows the calculation down markedly!\
\>", "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(\(Sort[data, #1[\([1]\)] \[LessEqual] #2[\([1]\)] &];\) //
      Timing\)], "Input"],

Cell[BoxData[
    \({150.16`\ Second, Null}\)], "Output"]
}, Closed]],

Cell[TextData[{
  "But using ",
  StyleBox["Ordering",
    FontFamily->"Courier"],
  ", as below will usually speed things up (the advantage increases \
with the size of k)."
}], "Text"],

Cell[BoxData[
    \(sortn[dat_,
        n_] := \[IndentingNewLine]dat[\([Ordering[
            dat[\([All, n]\)]]]\)]\)], "Input",
  InitializationCell->True,
  FontColor->RGBColor[0, 0, 1]],

Cell[CellGroupData[{

Cell[BoxData[
    \(\(sortn[data, 1];\) // Timing\)], "Input"],

Cell[BoxData[
    \({1.4799999999999898`\ Second, Null}\)], "Output"]
}, Closed]]
}, Closed]],

Cell[CellGroupData[{

Cell["Split", "Subsection"],

Cell[BoxData[{
    \(\(data = \((SeedRandom[0];
          Table[Random[
              Integer, {1, 100}], {100000}, {2}])\);\)\), "\n",
    \(\(sd = \ Sort[data];\)\)}], "Input"],

Cell[TextData[{
  "Split  ",
  StyleBox["data",
    FontFamily->"Courier"],
  "  by the first coordinate"
}], "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(\(Split[sd, \ #1[\([1]\)] \[Equal] #2[\([1]\)] &];\) //
      Timing\)], "Input"],

Cell[BoxData[
    \({10.549999999999999`\ Second, Null}\)], "Output"]
}, Closed]],

Cell["The following (following Hartmut Wolf) is quicker ", "Text"],

Cell[BoxData[
    \(splitnX[dat_,
        n_] := \[IndentingNewLine]Module[{s, p,
          tks}, \[IndentingNewLine]s = Split[dat[\([All, n]\)]];
        p = FoldList[\ #1 + #2 &,
            0, \ \ Length /@ s]; \[IndentingNewLine]\(Take[
              dat, #] &\) /@
          Transpose[{Drop[p, \(-1\)] + 1,
              Rest[p]}]\[IndentingNewLine]]\)], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
    \(\(splitnX[sd, 1];\) // Timing\)], "Input"],

Cell[BoxData[
    \({4.939999999999998`\ Second, Null}\)], "Output"]
}, Closed]],

Cell[TextData[{
  "And using the trick we found above when looking at",
  StyleBox[" Table ",
    FontFamily->"Courier New"],
  "it gets still quicker"
}], "Text"],

Cell[BoxData[
    \(splitn[dat_,
        n_] := \[IndentingNewLine]Module[{s, p, tks,
          dd}, \[IndentingNewLine]ss = \(s =
            Split[dat[\([All, n]\)]]\);
        p = FoldList[\ #1 + #2 &,
            0, \ \ Length /@ s]; \[IndentingNewLine]dd = \
          dat; \[IndentingNewLine]\(Take[dd, #] &\) /@
          Transpose[{Drop[p, \(-1\)] + 1,
              Rest[p]}]\[IndentingNewLine]]\)], "Input",
  InitializationCell->True,
  FontColor->RGBColor[0, 0, 1]],

Cell[CellGroupData[{

Cell[BoxData[
    \(\(splitn[sd, 1];\) // Timing\)], "Input"],

Cell[BoxData[
    \({2.09`\ Second, Null}\)], "Output"]
}, Closed]]
}, Closed]]
}, Closed]],

Cell[CellGroupData[{

Cell["Application to Binned Averages", "Section"],

Cell["\<\
Daniel Lichtblau's elegant compiled solution is quickest. The \
uncompiled version, 2.,  comes fairly close, but will probably fall \
further behind with very big inputs.\
\>", "Text"],

Cell[TextData[{
  "To make the timings more comparable, each section below starts \
with ",
  StyleBox["Quit",
    FontFamily->"Courier New"],
  ". This must be evaluated separately before the rest of the code. \
You will be asked if you want to evaluate the initialization cells \
when continuing with rest of the code - click \"Yes\". "
}], "Text"],

Cell[CellGroupData[{

Cell["1. Hartmut Wolf", "Subsection",
  CellDingbat->None,
  CellMargins->{{10, Inherited}, {Inherited, Inherited}}],

Cell[BoxData[
    \(Quit\)], "Input"],

Cell[BoxData[
    \(\(data = \((SeedRandom[0];
          Table[Random[
              Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"],

Cell[BoxData[
    \(\(\(binnedAverage1[data_, max_] :=
      Module[{v, i, ix, ixx, ixxx}, {i, v} =
          With[{rr = Range[max]},
            Transpose[
              Sort[Join[data,
                  Transpose[{rr,
                      rr - rr}]]]]]; \[IndentingNewLine]ix =
          Split[i]; \[IndentingNewLine]ixx =
          FoldList[Plus[#1, Length[#2]] &, 0,
            ix]; \[IndentingNewLine]ixxx =
          Transpose[
            Transpose[Partition[ixx, 2, 1]] + {1,
                0}]; \[IndentingNewLine]Transpose[{First /@
              ix, \(\(\((Plus @@ #)\)/Max[Length[#] - 1, 1] &\)[
                  Take[v, #]] &\) /@ ixxx}]]\)\(\n\)
    \)\)], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
    \(Table[\(\(\((r1 = binnedAverage1[data, 100])\);\) // Timing\) //
        First, {5}]\)], "Input"],

Cell[BoxData[
    \({3.790000000000001`\ Second, 4.609999999999999`\ Second,
      4.17`\ Second, 5.`\ Second,
      4.010000000000002`\ Second}\)], "Output"]
}, Closed]]
}, Closed]],

Cell[CellGroupData[{

Cell["2. Allan Hayes", "Subsection",
  CellDingbat->None,
  CellMargins->{{10, Inherited}, {Inherited, Inherited}}],

Cell[TextData[{
  "Note that ",
  StyleBox["Tr[lst]",
    FontFamily->"Courier"],
  StyleBox[" is used ",
    FontFamily->"Times New Roman"],
  StyleBox["ins",
    FontFamily->"Times New Roman"],
  "tead o",
  StyleBox["f ",
    FontFamily->"Times New Roman"],
  StyleBox["Plus@@lst",
    FontFamily->"Courier"],
  StyleBox[" ",
    FontFamily->"Times New Roman"],
  StyleBox["to sum",
    FontFamily->"Times New Roman"],
  StyleBox[" ",
    FontFamily->"Times New Roman"],
  StyleBox["lst",
    FontFamily->"Courier"],
  "."
}], "Text"],

Cell[BoxData[
    \(Quit\)], "Input"],

Cell[BoxData[
    \(\(data = \((SeedRandom[0];
          Table[Random[
              Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"],

Cell[BoxData[
    \(binnedAverage2[data_, \
        max_:  0] := \[IndentingNewLine]Module[{spl, avs, inds, dd,
          zs}, \[IndentingNewLine]spl =
          splitn[sortn[data, 1],
            1]; \[IndentingNewLine]avs = \(\((Tr[#1[\([All, 2]\)]]/
                  Length[#1])\) &\) /@
            spl; \[IndentingNewLine]inds = \
          spl[\([All, 1, 1]\)]; \[IndentingNewLine]dd\  = \
          Range[Max[Last[inds], max]]; \[IndentingNewLine]zs = \
          dd - dd; \[IndentingNewLine]zs[\([inds]\)] = \
          avs; \[IndentingNewLine]Transpose[{dd,
            zs}]\[IndentingNewLine]]\)], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
    \(Table[\(binnedAverage2[data] // Timing\) //
        First, {5}]\)], "Input"],

Cell[BoxData[
    \({2.8100000000000005`\ Second, 3.0700000000000003`\ Second,
      2.799999999999999`\ Second, 2.799999999999999`\ Second,
      2.75`\ Second}\)], "Output"]
}, Closed]]
}, Closed]],

Cell[CellGroupData[{

Cell["3. Daniel Lichtblau", "Subsection",
  CellDingbat->None,
  CellMargins->{{10, Inherited}, {Inherited, Inherited}}],

Cell[BoxData[
    \(Quit\)], "Input"],

Cell[BoxData[
    \(\(data = \((SeedRandom[0];
          Table[Random[
              Integer, {1, 100}], {100000}, {2}])\);\)\)], "Input"],

Cell[BoxData[
    StyleBox[\(averageByBin3[
          data : {{_, _} .. }]\  := \ \[IndentingNewLine]Module[\n\
{res = Transpose[data]}, \n
          res\  = \ averagebyBinC[res[\([1]\)], res[\([2]\)]]; \n
          Transpose[{Range[Length[res]], res}]\n]\),
      FormatType->StandardForm]], "Input"],

Cell[BoxData[
    StyleBox[\(\(\n\)\(\(averagebyBinC\  = \
          Compile[{{xvals, _Integer, 1}, {yvals, _Real, 1}}, \n
            Module[{len = Max[xvals], \ binsizes, \ averages,
                len2 = Length[xvals], \ indx}, \n
              binsizes\  = \ Table[0, {len}]; \n
              averages\  = \ Table[0. , {len}]; \n
              Do\ [\nindx\  = \
                  xvals[\([j]\)]; \n\(binsizes[\([indx]\)]++\); \n
                averages[\([indx]\)]\  += \ yvals[\([j]\)], \n{j,
                  len2}\n]; \n
              binsizes\  = \ Map[If\ [# == 0, 1, #] &, \ binsizes]; \n
              averages\ /\ binsizes\n]\n];\)\(\n\)
      \)\),
      FormatType->StandardForm]], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
    \(Table[\(\(averageByBin3[data];\) // Timing\) //
        First, {5}]\)], "Input"],

Cell[BoxData[
    \({2.3099999999999996`\ Second, 2.1400000000000006`\ Second,
      2.3599999999999994`\ Second, 2.3100000000000005`\ Second,
      2.3599999999999994`\ Second}\)], "Output"]
}, Closed]]
}, Closed]]
}, Closed]]
}, Open  ]]
},
FrontEndVersion->"4.2 for Microsoft Windows",
ScreenRectangle->{{0, 1024}, {0, 723}},
AutoGeneratedPackage->None,
WindowToolbars->"RulerBar",
WindowSize->{459, 310},
WindowMargins->{{250, Automatic}, {Automatic, -20}},
ShowSelection->True
]

+++++++





  • Prev by Date: Comparison of Mathematica on Various Computers
  • Next by Date: Getting symbols out of list returned by solve
  • Previous by thread: Mathematica 4.0 on Windows 98
  • Next by thread: Getting symbols out of list returned by solve