Mathematica 9 is now available
Services & Resources / Wolfram Forums
-----
 /
MathGroup Archive
2005
*January
*February
*March
*April
*May
*June
*July
*August
*September
*October
*November
*December
*Archive Index
*Ask about this page
*Print this page
*Give us feedback
*Sign up for the Wolfram Insider

MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: letrec/named let

  • To: mathgroup at smc.vnet.net
  • Subject: [mg56842] Re: letrec/named let
  • From: DrBob <drbob at bigfoot.com>
  • Date: Sat, 7 May 2005 04:17:12 -0400 (EDT)
  • References: <069a01c55260$afdad760$6400a8c0@Main> <opsqc8fk0aiz9bcq@monster.ma.dl.cox.net> <06cb01c55267$3a6b5930$6400a8c0@Main> <opsqdiiiixiz9bcq@monster.ma.dl.cox.net> <083a01c5528e$c7e11da0$6400a8c0@Main> <opsqdmxkhmiz9bcq@monster.ma.dl.cox.net> <084901c55293$e05d53d0$6400a8c0@Main> <opsqdqs5jsiz9bcq@monster.ma.dl.cox.net> <087801c552b0$e94d5bd0$6400a8c0@Main>
  • Reply-to: drbob at bigfoot.com
  • Sender: owner-wri-mathgroup at wolfram.com

Indeed, for real data, your substitute for ordering is much faster (in 5.1.1) than the built-in. If it's 2.5 times SLOWER on your machine, that does seem to imply WRI has radically DECREASED Ordering's performance from 4.1 to 5.1.1.

It's hard to believe they'd do that, but apparently they have.

> The only possibility I can think of is that
> ord=Ordering[data] returns a packed array of integers when data is integer,
> while ord=Ordering[data] returns an unpacked array of integers when data is
> real.

Exactly right. See below. Maybe this explains the performance issue for Ordering itself, too.

(Note: 'data' is packed in both tests.)

pq = Developer`PackedArrayQ;
carlTimed[s_] := Module[{ord, t, o, ans},
     Print@Timing[ord = Ordering@s; Ordering];
     Print@Timing[t = FoldList[
     Plus, 1, Sign[Abs[ListCorrelate[{1, -1}, s[[ord]]]]]]; FoldList];
     Print@Timing[o = Ordering@ord; Ordering];
     Print@Timing[ans = t[[o]]; Part];
     Print[pq /@ {ord, t, o, ans}];
     ans]
ordering[x_List] := Round@Sort[Transpose[{x, N@Range@Length@x}]][[All, 2]]
carlNewOrder[s_] := Module[{ord, t, o, ans},
     Print@Timing[ord = ordering@s; ordering];
     Print@Timing[
     t = FoldList[Plus, 1, Sign[Abs[ListCorrelate[{1, -1}, s[[ord]]]]]];
     FoldList];
     Print@Timing[o = ordering@ord; ordering];
     Print@Timing[ans = t[[o]]; Part];
     Print[pq /@ {ord, t, o, ans}];
     ans]

data = Table[Random[], {10^6}];
Timing[carlTimed@data; Total]
Timing[carlNewOrder@data; Total]

{8. Second,Ordering}
{0.391 Second,FoldList}
{7.25 Second,Ordering}
{0.063 Second,Part}
{False,True,True,True}
{15.891 Second,Total}

{1.063 Second,ordering}
{0.343 Second,FoldList}
{7.36 Second,ordering}
{0.062 Second,Part}
{True,True,True,True}
{8.828 Second,Total}

Notice ordering returned a packed array (ord) for real data, but Ordering didn't. Also notice the second use (with Integer data) has Ordering and ordering equally fast, but that's despite ordering being applied to a packed integer array, but Ordering applied to an unpacked array.

For integer (Poisson) data, Ordering is much faster than ordering BOTH times it is used. (It's applied to, and returns, packed arrays each time in both routines.)

data = 1 + RandomArray[PoissonDistribution[2], {10^6}];
Timing[carlTimed@data; Total]
Timing[carlNewOrder@data; Total]

{0.781 Second,Ordering}
{0.266 Second,FoldList}
{0.64 Second,Ordering}
{0.016 Second,Part}
{True,True,True,True}
{1.703 Second,Total}

{6.969 Second,ordering}
{0.234 Second,FoldList}
{5.703 Second,ordering}
{0.031 Second,Part}
{True,True,True,True}
{12.937 Second,Total}

But you intended 'ordering' only for real data anyway.

Bobby

On Fri, 6 May 2005 23:00:22 -0400, Carl K. Woll <carl at woll2woll.com> wrote:

> ----- Original Message -----
> From: "DrBob" <drbob at bigfoot.com>
To: mathgroup at smc.vnet.net
> To: "Carl K. Woll" <carl at woll2woll.com>
> Subject: [mg56842] Re: letrec/named let
>
>
>>> Union[data] now takes 0.626 seconds, while Ordering[data]
>>> takes 8.031 seconds. On my machine Union[data] takes 1.2
>>> seconds and Ordering[data] takes 1.1 seconds.
>>
>> That can have many interpretations, can't it?
>>
>> For instance...
>>
>> eqns = Thread[{1.2, 1.1}/{0.626, 8.031} ==
>>  (myMachine/yourMachine)* {unionSpeedup, orderingSpeedup}];
>>
>> {1/orderingSpeedup, myMachine/yourMachine} /.
>>    ToRules@Reduce@Flatten@{eqns, unionSpeedup == 1}
>>
>> {13.9954, 1.91693}
>>
>> That says my machine is 1.9 times as fast as yours (not unlikely), Union
>> is equally fast in both versions (not unlikely), and Ordering is 14 times
>> faster in 4.1 than 5.1.1 (possible, but unlikely).
>>
>> or...
>>
>> {unionSpeedup, yourMachine/
>>     myMachine} /. ToRules@Reduce@Flatten@{eqns, orderingSpeedup == 1}
>>
>> {13.9954,7.30091}
>>
>> In that case, your machine is 7.3 times faster than mine (unlikely, but
>> possible), Union is 14 times faster in 5.1.1 than 4.1 (quite possible),
>> and Ordering is equally fast in both versions (also possible).
>>
>> I suspect BOTH routines are faster in the new version, but the improvement
>> in Union is 14 times larger than the improvement in Ordering:
>>
>> unionSpeedup/orderingSpeedup /. ToRules@Reduce@eqns
>>
>> 13.9954
>>
>> Bobby
>>
>
> Let's first compare Union and Sort. I assume that with data being a list of
> a million reals, that Sort[data] should take about the same time or a little
> less than Union[data], since Union also sorts. I find it unlikely that Sort,
> whose algorithmic complexity was figured out ages ago, could experience an
> order of magnitude increase in speed in the latest upgrade. Moreover, it
> seems like the kind of thing that Wolfram would advertise if such a speed
> increase occurred.
>
> At any rate, if Sort and Union take the same amount of time, that means Sort
> is about 13 times quicker than Ordering on your machine. Now, the algorithm
> for Sort should be essentially the same as the algorithm for Ordering, so
> I'm surprised that Ordering is so slow. In fact, it might be possible to
> come up with a function which is faster than the built in Ordering. For
> example,
>
> ordering[x:{__Real}] := Round @ Sort[ Transpose[{x,
> N@Range@Length@x}]][[All,2]]
>
> is only 2 1/2 times slower than Ordering for Reals on my machine, and is
> probably much faster than the built in Ordering on your machine.
>
> I'm also puzzled by some of your timings. The 3rd line of my function, which
> you call Part, takes 10 times as long when the data is real. But,
> t[[Ordering[ord]]] applies Ordering to ord, and ord=Ordering[data] is always
> a permutation of the first Length[data] integers. Moreover, t is always a
> list of Length[data] integers. Why does t[[Ordering[ord]]] take so long when
> ord is the Ordering of real data, but is so quick when ord is the Ordering
> of integer data? The only possibility I can think of is that
> ord=Ordering[data] returns a packed array of integers when data is integer,
> while ord=Ordering[data] returns an unpacked array of integers when data is
> real.
>
> Carl Woll
>
>> On Fri, 6 May 2005 19:32:32 -0400, Carl K. Woll <carl at woll2woll.com>
>> wrote:
>>
>>> ----- Original Message -----
>>> From: "DrBob" <drbob at bigfoot.com>
To: mathgroup at smc.vnet.net
>>> To: "Carl K. Woll" <carl at woll2woll.com>; "MathGroup"
>>> Sent: Friday, May 06, 2005 7:16 PM
>>> Subject: [mg56842] Re: letrec/named let
>>>
>>>
>>>> I was assuming nothing in your code would be slower in the newest
>>>> version,
>>>> so I was hoping to identify a component that had gotten faster from
>>>> version
>>>> to version in MY code. Perhaps Union is 12 times faster in 5.1.1, for
>>>> instance.
>>>>
>>>> Either way, as I said before, we need timings on both machines to narrow
>>>> it down.
>>>>
>>>> Nevertheless, here are timings for statements in "carl":
>>>>
>>>> carlTimed[s_]:=Module[{ord,t,ans},
>>>>     Print@Timing[ord=Ordering@s;Ordering];
>>>>
>>>> Print@Timing[t=FoldList[Plus,1,Sign[Abs[ListCorrelate[{1,-1},s[[ord]]]]]];
>>>>     FoldList];
>>>>     Print@Timing[ans=t[[Ordering[ord]]];Part]]
>>>>
>>>> data=1+RandomArray[PoissonDistribution[2],{10^6}];
>>>> Timing[three=carlTimed@data;]
>>>>
>>>> {0.813 Second,Ordering}
>>>> {0.281 Second,FoldList}
>>>> {0.641 Second,Part}
>>>> {1.735 Second,Null}
>>>>
>>>> data=Table[Random[],{10^6}];
>>>> Timing[three=carlTimed@data;]
>>>>
>>>> {8.031 Second,Ordering}
>>>> {0.391 Second,FoldList}
>>>> {7.281 Second,Part}
>>>> {15.906 Second,Null}
>>>>
>>>> Bobby
>>>>
>>>
>>> And so we discover that when data is a list of a million reals,
>>> Union[data]
>>> now takes 0.626 seconds, while Ordering[data] takes 8.031 seconds. On my
>>> machine Union[data] takes 1.2 seconds and Ordering[data] takes 1.1
>>> seconds.
>>> I think we've found the bottleneck. Ordering is no longer a very fast
>>> function, at least when it's argument is a list of reals.
>>>
>>> Carl Woll
>>>
>>>> On Fri, 6 May 2005 18:56:03 -0400, Carl K. Woll <carl at woll2woll.com>
>>>> wrote:
>>>>
>>>>> ----- Original Message -----
>>>>> From: "DrBob" <drbob at bigfoot.com>
To: mathgroup at smc.vnet.net
>>>>> To: "Carl K. Woll" <carl at woll2woll.com>; "Carl K. Woll"
>>>>> Sent: Friday, May 06, 2005 5:40 PM
>>>>> Subject: [mg56842] Re: letrec/named let
>>>>>
>>>>>
>>>>>> I'm guessing Union, ReplaceAll, Dispatch, and/or Thread are better
>>>>>> optimized in version 5.1.1. We'd have to get relative times for
>>>>>> various
>>>>>> statements on BOTH our machines, to narrow it down. Here they are for
>>>>>> my
>>>>>> machine:
>>>>>>
>>>>>> treat2[s_List] := Module[{u, t, d, ans},
>>>>>>     Print@Timing[u = Union@s; "Union"];
>>>>>>     Print@Timing[t = Thread[u -> Range@Length@u]; "Thread"];
>>>>>>     Print@Timing[d = Dispatch@t; "Dispatch"];
>>>>>>     Print@Timing[ans = s /. d; "ReplaceAll"];
>>>>>>     ans]
>>>>>>
>>>>>> data=1+RandomArray[PoissonDistribution[2],{10^6}];
>>>>>> Timing[treat2@data;]
>>>>>>
>>>>>> {0.265 Second,Union}
>>>>>> {0. Second,Thread}
>>>>>> {0. Second,Dispatch}
>>>>>> {0.297 Second,ReplaceAll}
>>>>>> {0.562 Second,Null}
>>>>>>
>>>>>> data=Table[Random[],{10^6}];
>>>>>> Timing[treat2@data;]
>>>>>>
>>>>>> {0.625 Second,Union}
>>>>>> {2.922 Second,Thread}
>>>>>> {1.625 Second,Dispatch}
>>>>>> {3.89 Second,ReplaceAll}
>>>>>> {10.328 Second,Null}
>>>>>>
>>>>>> If your machine has Dispatch (for instance) taking a larger fraction
>>>>>> of
>>>>>> the total time, we'll have a clue to what changed.
>>>>>>
>>>>>> I'm actually shocked by how fast this is, considering the relatively
>>>>>> naive
>>>>>> (IMO) method involved.
>>>>>>
>>>>>> Bobby
>>>>>>
>>>>>
>>>>> I'm not surprised by the timings you give for treat. I'm surprised by
>>>>> your
>>>>> timing for my function, carl. On my machine, applying my function to a
>>>>> million reals takes about 2 times as long as a Union. On your machine,
>>>>> it
>>>>> apparently takes about 24 times as long as a Union. The 1st and 3rd
>>>>> lines
>>>>> of
>>>>> my function are basically sorting operations, so the bottleneck on your
>>>>> machine must be the 2nd line. However, on my machine, the 2nd line is
>>>>> the
>>>>> fastest operation. The only candidates for a bottleneck seem to be
>>>>> FoldList
>>>>> and ListCorrelate, or perhaps something funny is going on with packed
>>>>> arrays. I'd be curious what timings my function would get if you
>>>>> "treat"ed
>>>>> it like treat2 <grin>.
>>>>>
>>>>> Carl Woll
>>>>>
>>>>>> On Fri, 6 May 2005 14:12:55 -0400, Carl K. Woll <carl at woll2woll.com>
>>>>>> wrote:
>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>> From: "DrBob" <drbob at bigfoot.com>
To: mathgroup at smc.vnet.net
>>>>>>> To: "Carl K. Woll" <carlw at u.washington.edu>; "MathGroup"
>>>>>>> Sent: Friday, May 06, 2005 2:03 PM
>>>>>>> Subject: [mg56842] Re: letrec/named let
>>>>>>>
>>>>>>>
>>>>>>>> That's a very nice "rescue" of the Ordering@Ordering solution.
>>>>>>>>
>>>>>>>> Here's a test with (probably) no repeated elements, however, that
>>>>>>>> shows
>>>>>>>> treat ahead. Perhaps there's an intermediate situation where carl
>>>>>>>> wins,
>>>>>>>> or
>>>>>>>> it matters whether the data is integer or real?
>>>>>>>>
>>>>>>>> data=Table[Random[],{10^6}];
>>>>>>>> Timing[one=treat@data;]
>>>>>>>> Timing[three=carl@data;]
>>>>>>>> one==three
>>>>>>>>
>>>>>>>> {10.109 Second,Null}
>>>>>>>> {15.938 Second,Null}
>>>>>>>> True
>>>>>>>>
>>>>>>>
>>>>>>> Interesting. On my machine, carl is over 5 times faster than treat
>>>>>>> for
>>>>>>> the
>>>>>>> same test. I have version 4.1 on a Windows XP operating system. What
>>>>>>> is
>>>>>>> your
>>>>>>> setup? Perhaps you could test each individual line of carl to see
>>>>>>> where
>>>>>>> the
>>>>>>> bottleneck is.
>>>>>>>
>>>>>>> Carl Woll
>>>>>>>
>>>>>>>> Bobby
>>>>>>>>
>>>>>>>> On Fri, 6 May 2005 13:26:03 -0400, Carl K. Woll
>>>>>>>> <carlw at u.washington.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> "DrBob" <drbob at bigfoot.com> wrote in message
>>>>>>>>> news:d5f5ka$6dn$1 at smc.vnet.net...
>>>>>>>>>> Our solutions agree on lists of strictly positive integers, but
>>>>>>>>>> timings
>>>>>>>>>> depend a great deal on the minimum value:
>>>>>>>>>>
>>>>>>>>>> Clear[treat, andrzej]
>>>>>>>>>> treat[s_List] := Module[{u = Union@s}, s /. Dispatch@Thread[u ->
>>>>>>>>>> Range@
>>>>>>>>>>     Length@u]]
>>>>>>>>>> andrzej[s_List] :=
>>>>>>>>>>      First@NestWhile[Apply[If[FreeQ[##], {#1 /. x_ /;
>>>>>>>>>>             x > #2 :> x - 1, #2}, {#1, #2 + 1}] &, #] &, {s, 1},
>>>>>>>>>>     Last[#] < Max[First[#]] &]
>>>>>>>>>>
>>>>>>>>>> data=1+RandomArray[PoissonDistribution[2],{10^6}];
>>>>>>>>>> Timing[one=treat@data;]
>>>>>>>>>> Timing[two=andrzej@data;]
>>>>>>>>>> one == two
>>>>>>>>>>
>>>>>>>>>> {0.593 Second,Null}
>>>>>>>>>> {0.657 Second,Null}
>>>>>>>>>> True
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [snip]
>>>>>>>>>
>>>>>>>>> Another possibility modeled after DrBob's first answer is the
>>>>>>>>> following:
>>>>>>>>>
>>>>>>>>> carl[s_] := Module[{ord,t},
>>>>>>>>>  ord = Ordering[s];
>>>>>>>>>  t = FoldList[Plus, 1, Sign[Abs[ListCorrelate[{1, -1},
>>>>>>>>> s[[ord]]]]]];
>>>>>>>>>  t[[Ordering[ord]]]
>>>>>>>>> ]
>>>>>>>>>
>>>>>>>>> If there are a lot of repeated elements in the data, then treat
>>>>>>>>> seems
>>>>>>>>> to
>>>>>>>>> be
>>>>>>>>> faster. On the other hand, if there aren't a lot of repeated
>>>>>>>>> elements,
>>>>>>>>> then
>>>>>>>>> carl seems to be faster. It seems like it ought to be possible to
>>>>>>>>> compute
>>>>>>>>> Ordering[Ordering[data]] more quickly since
>>>>>>>>> Ordering[Ordering[Ordering[data]]] equals Ordering[data], but I
>>>>>>>>> couldn't
>>>>>>>>> think of a way.
>>>>>>>>>
>>>>>>>>> Carl Woll
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> DrBob at bigfoot.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> DrBob at bigfoot.com
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> DrBob at bigfoot.com
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> DrBob at bigfoot.com
>>
>>
>
>
>
>
>



-- 
DrBob at bigfoot.com


  • Prev by Date: Re: Mathematica Notebook Organiztion
  • Next by Date: meaning of a "*" in search string?
  • Previous by thread: Re: letrec/named let
  • Next by thread: Re: letrec/named let