MathGroup Archive 2008

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Re: Overlapping binning of differences of

  • To: mathgroup at smc.vnet.net
  • Subject: [mg92754] Re: [mg92717] Re: [mg92607] Overlapping binning of differences of
  • From: danl at wolfram.com
  • Date: Sun, 12 Oct 2008 04:33:27 -0400 (EDT)
  • References: <200810081024.GAA00469@smc.vnet.net>

> Art wrote:
>> Given two sorted vectors a and b of different lengths, what is the
>> best way to count the number of elements in the set of all differences
>> between elements of a and b that fall in overlapping bins of [-bsize -
>> i, bsize - i) for i in Range[-n, n], where bsize >= 1.
>>
>> Below are 2 implementations I've tried which are two slow and memory
>> intensive. I haven't quite been able to do it using BinCounts,
>> Partition, and ListCorrelate.
>>
>> Was wondering if there is a faster way.
>>
>> (* Generate random a, b  *)
>> T = 500; bsize = 10; n = 20;
>> r := Rest@FoldList[Plus, 0, RandomReal[ExponentialDistribution[0.01],
>> {T}]]
>> a = r; b = r;
>>
>> bindiff1[a_, b_, bsize_, n_] :=
>>   With[{d = Flatten@Outer[Subtract, a, b]},
>>   	Table[Count[d, _?(-bsize <= # - i < bsize &)], {i, -n, n}]]
>>
>> bindiff2[a_, b_, bsize_, n_] :=
>>  Module[{os, i, j, s, tmp,
>>    d = Sort@Flatten@Outer[Subtract, a, b],
>>    c = ConstantArray[0, {2 n + 1}]},
>>   For[os = 0; j = 1; i = -n,  i <= n,  i++; j++,
>>    s = Flatten@Position[Drop[d, os], _?(# >= -bsize + i &), 1, 1];
>>    If[s == {}, Break[],
>>     os += s[[1]] - 1;
>>     tmp = Flatten@Position[Drop[d, os], _?(# > bsize + i &), 1, 1];
>>     c[[j]] = If[tmp == {}, Length[d] - os, First@tmp - 1]]];
>>   Return[c]]
>>
>> First@Timing@bindiff[a,b, bsize, n] is about 36 seconds.
>>
>> First@Timing@bindiff2[a, b, bsize, n] is about 3 seconds but still too
>> slow and d uses up too much memory.
>>
>> Thanks!
>
> Something similar to your bindiff2 can be sent through Compile.
>
> bindiff3=Compile[{{l1,_Real,1}, {l2,_Real,1}, {bsize,_Integer},
> {n,_Integer}},
>   Module[{os, diffs, bins, j, k=-n, m},
>    diffs = Sort[Flatten[Outer[Plus, l1, -l2]]];
>    bins = ConstantArray[0, 2*n + 1];
>    For[j=1, j<=Length[diffs], j++,
>     If [diffs[[j]]<-bsize+k,Continue[]];
>     While[diffs[[j]]>bsize+k && k<=n, k++];
>     m=k;
>     If[k>n, Return[bins]];
>     While[-bsize+m<diffs[[j]]<bsize+m &&m<=n, bins[[m+n+1]]++; m++];
>     ];
>    bins
>    ]];
>
> With this I get around an order of magnitude improvement in speed on
> your example.
>
> Daniel Lichtblau
> Wolfram Research


Having read other responses I now realize that sorting the differences can
be the main bottleneck, at least when there are not too many bins to
consider. Here is code that avoids sorting. If each list is around 2000
elements then it is roughly twice as fast as the code above.

bindiff4 =
  Compile[{{l1, _Real, 1}, {l2, _Real,
     1}, {bsize, _Integer}, {n, _Integer}},
   Module[{diffs, bins, j, m, dj, ld, mx = bsize + n},
    bins = ConstantArray[0, 2*n + 1];
    diffs = Flatten[Outer[Plus, l1, -l2]];
    ld = Length[diffs];
    For[j = 1, j <= ld, j++,
     dj = diffs[[j]];
     If[Abs[dj] > mx, Continue[]];
     m = -n;
     While[dj > bsize + m, m++];
     While[-bsize + m < dj && m <= n, bins[[m + n + 1]]++; m++];
     ];
    bins]];

If there were many bins, it might be advantageous to use a binary search
to determine the range of bins into which each difference falls. (Another
response did something similar, but first sorted the differences and used
binary search to find the range of differences falling into each bin.)

Daniel Lichtblau
Wolfram Research




  • Prev by Date: Re: Inconsistent syntax for Row and Column?
  • Next by Date: Re: Variable amount of Buttons in Mathematica
  • Previous by thread: Re: Overlapping binning of differences of two lists
  • Next by thread: Re: Overlapping binning of differences of two lists