MathGroup Archive 2008

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Overlapping binning of differences of two lists

  • To: mathgroup at smc.vnet.net
  • Subject: [mg92654] Re: Overlapping binning of differences of two lists
  • From: Januk <ggroup at sarj.ca>
  • Date: Thu, 9 Oct 2008 06:38:45 -0400 (EDT)
  • References: <gci2hv$t8$1@smc.vnet.net>

Hi Art,

A very speedy way would be to take advantage of the fact that your
list d can be sorted.  You can then use a binary search algorithm to
find the starting and ending positions of the particular bin you're
looking at.  The difference will be the number of elements in that
range.

Here is a binary search algorithm taken almost verbatim from this list
(sorry, I can't find the reference):

FindPosSorted[lst_List, e_, p_: ((#1 <= #2) &)] := Module[
   {lower = 1, upper = Length[lst] + 1},
   While[lower != upper,
    With[{mid = Floor[(lower + upper)/2]},
      If[p[e, lst[[mid]]],
        upper = mid,
        lower = mid + 1
        ];
      ];
    ];
   If[e == lst[[lower]], lower++];
   Return[lower]
   ];

You can then use something like:
Timing[
 tst = Module[{d2},
    Table[
     d2 = d - i;
     FindPosSorted[d2, bsize] - FindPosSorted[d2, -bsize],
     {i, -n, n}
     ]
    ];
 ]

Compare this to your first concept:

First@Timing[
  tst2 = bindiff1[a, b, bsize, n]
  ]

On my system, there is an approximately 100x improvement in speed
using the binary search method.

I hope that helps.

Januk

On Oct 8, 6:37 am, Art <grenan... at gmail.com> wrote:
> Given two sorted vectors a and b of different lengths, what is the
> best way to count the number of elements in the set of all differences
> between elements of a and b that fall in overlapping bins of [-bsize -
> i, bsize - i) for i in Range[-n, n], where bsize >= 1.
>
> Below are 2 implementations I've tried which are two slow and memory
> intensive. I haven't quite been able to do it using BinCounts,
> Partition, and ListCorrelate.
>
> Was wondering if there is a faster way.
>
> (* Generate random a, b  *)
> T = 500; bsize = 10; n = 20;
> r := Rest@FoldList[Plus, 0, RandomReal[ExponentialDistribution[0.01],
> {T}]]
> a = r; b = r;
>
> bindiff1[a_, b_, bsize_, n_] :=
>   With[{d = Flatten@Outer[Subtract, a, b]},
>         Table[Count[d, _?(-bsize <= # - i < bsize &)], {i, -n, =
n}]]
>
> bindiff2[a_, b_, bsize_, n_] :=
>  Module[{os, i, j, s, tmp,
>    d = Sort@Flatten@Outer[Subtract, a, b],
>    c = ConstantArray[0, {2 n + 1}]},
>   For[os = 0; j = 1; i = -n,  i <= n,  i++; j++,
>    s = Flatten@Position[Drop[d, os], _?(# >= -bsize + i &), 1, 1]=
;
>    If[s == {}, Break[],
>     os += s[[1]] - 1;
>     tmp = Flatten@Position[Drop[d, os], _?(# > bsize + i &), 1, 1];
>     c[[j]] = If[tmp == {}, Length[d] - os, First@tmp - 1]]];
>   Return[c]]
>
> First@Timing@bindiff[a,b, bsize, n] is about 36 seconds.
>
> First@Timing@bindiff2[a, b, bsize, n] is about 3 seconds but still too
> slow and d uses up too much memory.
>
> Thanks!



  • Prev by Date: Re: $Assumptions
  • Next by Date: Re: How to print TraditionalForm without evaluation
  • Previous by thread: Re: Overlapping binning of differences of two lists
  • Next by thread: Re: Overlapping binning of differences of two lists