MathGroup Archive 2001

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: how to be as efficient as Mean[list] - efficient summing of function applied to list

  • To: mathgroup at smc.vnet.net
  • Subject: [mg28547] Re: how to be as efficient as Mean[list] - efficient summing of function applied to list
  • From: Tom Burton <tburton at cts.com>
  • Date: Wed, 25 Apr 2001 19:21:55 -0400 (EDT)
  • References: <9c5o3v$iv1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

Hello Matt,

Paste the Notebook expression at the end of this note into Mathematica to see some of my ideas about this.

Regards,

Tom

On 25 Apr 2001 01:42:23 -0400, in comp.soft-sys.math.mathematica you wrote:

>I have a one dimensional list of 28,900 data points.  I want to apply a
>function, called f[x_], to each element of this list and sum the resulting
>list to obtain one number.  For the following example, f[x_] := -Log[x].  My
>naive idea was to do something like this:
>
>Length[data]
>28900
>
>Apply[Plus,Map[f[#1]&,data]]//Timing
>{32.136 Second,-69627.8}
>
>However, this seemed to take to long when compared to other built-in
>Mathematic functions that iterate over a list and perform a sum.  For
>example, the Mean function on the same data takes two orders of magnitude
>less time than my function:
>
>Mean[data]//Timing
>{0.13 Second,-0.0296526}
>
>Is the "Apply[Plus[...]]" method the right method to use for what I want to
>do?  Or, is it just that the time it takes my function f[x_] to perform one
>calculation, when amortized over the whole 28,900 data points, just adds up
>to a lot of time (32 seconds in my example).
>
>Regards, Matt

Notebook[{

Cell[CellGroupData[{
Cell[BoxData[
    \($Version\)], "Input"],

Cell[BoxData[
    \(TraditionalForm\`"4.1 for Microsoft Windows (November 2, 2000)"\
\)], "Output"]
}, Open  ]],

Cell[CellGroupData[{

Cell[BoxData[
    \($OperatingSystem\)], "Input"],

Cell[BoxData[
    \(TraditionalForm\`"WindowsNT"\)], "Output"]
}, Open  ]],

Cell["\<\
Consider a list of 300,000 real elements. (I have increased the \
number 10-fold from your example to obtain more reliable timing \
numbers.)\
\>", "Text"],

Cell[BoxData[
    \(\(data = Table[Random[], {300000}];\)\)], "Input"],

Cell["\<\
The time to compute the sum of these elements is (on my PC 400 MHz)\
\>", "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(Timing[Plus @@ data]\)], "Input"],

Cell[BoxData[
    \(TraditionalForm\`{0.732`\ Second, 
      150213.20944801893`}\)], "Output"]
}, Open  ]],

Cell[TextData[{
  "This is far less than the 32 seconds you found, so I would have to \
guess that most of the time spent in the operation ",
  StyleBox["f",
    FontSlant->"Italic"],
  ". But, you are correct, the Mean function is quite a bit faster:"
}], "Text"],

Cell[BoxData[
    \(<< Statistics`DescriptiveStatistics`\)], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
    \(Timing[Mean[data]]\)], "Input"],

Cell[BoxData[
    \(TraditionalForm\`{0.010000000000000231`\ Second, 
      0.5007106981600735`}\)], "Output"]
}, Open  ]],

Cell["\<\
The mean function, being a package, is written in high-level code, so \
I am a bit surprised at its speed. Let's look inside the Mean \
function to see how it does it.\
\>", "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(Attributes[Mean]\)], "Input"],

Cell[BoxData[
    \(TraditionalForm\`{Protected, ReadProtected}\)], "Output"]
}, Open  ]],

Cell[BoxData[{
    \(\(Unprotect[Mean];\)\), "\n", 
    \(ClearAttributes[Mean, ReadProtected]\)}], "Input"],

Cell[CellGroupData[{

Cell[BoxData[
    \(?? Mean\)], "Input"],

Cell[BoxData[
    RowBox[{"\<\"Mean[list] gives the mean of the entries in \
list.\"\>", " ", 
      ButtonBox[
        StyleBox["More\[Ellipsis]",
          "SR"],
        ButtonData:>"Statistics`DescriptiveStatistics`",
        Active->True,
        ButtonStyle->"AddOnsLink"]}]], "Print",
  CellTags->"Info3197183060-5340671"],

Cell[BoxData[
    InterpretationBox[GridBox[{
          {GridBox[{
                {\(Mean[
                      Statistics`DescriptiveStatistics`Private`list_] \
:= Tr[Statistics`DescriptiveStatistics`Private`list]\/Length[\
Statistics`DescriptiveStatistics`Private`list] /; 
                      VectorQ[
                          Statistics`DescriptiveStatistics`Private`\
list] && Length[Statistics`DescriptiveStatistics`Private`list] > 0\)}
                },
              GridBaseline->{Baseline, {1, 1}},
              ColumnWidths->0.999,
              ColumnAlignments->{Left}]}
          },
        GridBaseline->{Baseline, {1, 1}},
        ColumnAlignments->{Left}],
      Definition[ "Mean"],
      Editable->False]], "Print",
  CellTags->"Info3197183060-5340671"]
}, Open  ]],

Cell["\<\
The Mean function uses the built-in function Tr (trace) to compute \
the sum.\
\>", "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(\(?Tr\)\)], "Input"],

Cell[BoxData[
    RowBox[{
    "\"\<Tr[list] finds the trace of the matrix or tensor list. \
Tr[list, f] finds a generalized trace, combining terms with f instead \
of Plus. Tr[list, f, n] goes down to level n in list.\>\"", " ", 
      ButtonBox[
        StyleBox["More\[Ellipsis]",
          "SR"],
        ButtonData:>"Tr",
        Active->True,
        ButtonStyle->"RefGuideLink"]}]], "Print",
  GeneratedCell->False,
  CellAutoOverwrite->False,
  CellTags->"Info3197182622-5672632"]
}, Open  ]],

Cell["Of course, it's the sum you want:", "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(Timing[Tr[data]]\)], "Input"],

Cell[BoxData[
    \(TraditionalForm\`{0.009999999999999787`\ Second, 
      150213.20944802207`}\)], "Output"]
}, Open  ]],

Cell[TextData[{
  "So why is Plus@@list so slow? One possible reason is that ",
  StyleBox["Mathematica",
    FontSlant->"Italic"],
  " is trying to evaluate all elements of the list before summing. We \
can try to prevent that:"
}], "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(Timing[\(data2 = HoldComplete @@ data;\)]\)], "Input"],

Cell[BoxData[
    \(TraditionalForm\`{0.2400000000000002`\ Second, 
      Null}\)], "Output"]
}, Open  ]],

Cell[CellGroupData[{

Cell[BoxData[
    \(Timing[Plus @@ data2]\)], "Input"],

Cell[BoxData[
    \(TraditionalForm\`{0.4810000000000003`\ Second, 
      150213.20944801893`}\)], "Output"]
}, Open  ]],

Cell["\<\
HoldAllComplete cuts down the time a bit, but Plus@@ still much \
slower. Let's examine what happens with Trace:\
\>", "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(Trace[Plus @@ HoldAllComplete[1, 2, 3]]\)], "Input"],

Cell[BoxData[
    FormBox[
      RowBox[{"{", 
        RowBox[{
          TagBox[\(Plus @@ HoldAllComplete[1, 2, 3]\),
            HoldForm], ",", 
          TagBox[\(1 + 2 + 3\),
            HoldForm], ",", 
          TagBox["6",
            HoldForm]}], "}"}], TraditionalForm]], "Output"]
}, Open  ]],

Cell["\<\
Perhaps the time is taken forming the text expression \"1+2+3\". I \
suppose that the built-in function Tr dispenses with this step.\
\>", "Text"]
},
FrontEndVersion->"4.1 for Microsoft Windows",
ScreenRectangle->{{0, 1280}, {0, 979}},
WindowSize->{829, 740},
WindowMargins->{{0, Automatic}, {Automatic, 0}}
]


Tom


  • Prev by Date: Getting stylized text with a palette button
  • Next by Date: Q: How can I force ListPlot to show all data in graphics?
  • Previous by thread: Re: how to be as efficient as Mean[list] - efficient summing of function applied to list
  • Next by thread: Re: Re: how to be as efficient as Mean[list] - efficient summing of function applied to list