Re: Overlapping binning of differences of two lists
- To: mathgroup at smc.vnet.net
- Subject: [mg92729] Re: Overlapping binning of differences of two lists
- From: m.r at inbox.ru
- Date: Sat, 11 Oct 2008 06:47:04 -0400 (EDT)
- References: <gci2hv$t8$1@smc.vnet.net> <gckn7m$nom$1@smc.vnet.net>
Szabolcs Horvat wrote: > Art wrote: > > Given two sorted vectors a and b of different lengths, what is the > > best way to count the number of elements in the set of all differences > > between elements of a and b that fall in overlapping bins of [-bsize - > > i, bsize - i) for i in Range[-n, n], where bsize >= 1. > > > > Below are 2 implementations I've tried which are two slow and memory > > intensive. I haven't quite been able to do it using BinCounts, > > Partition, and ListCorrelate. > > > > Was wondering if there is a faster way. > > > > (* Generate random a, b *) > > T = 500; bsize = 10; n = 20; > > r := Rest@FoldList[Plus, 0, RandomReal[ExponentialDistribution[0.01], > > {T}]] > > a = r; b = r; > > > > bindiff1[a_, b_, bsize_, n_] := > > With[{d = Flatten@Outer[Subtract, a, b]}, > > Table[Count[d, _?(-bsize <= # - i < bsize &)], {i, -n, n}]] > > > > bindiff2[a_, b_, bsize_, n_] := > > Module[{os, i, j, s, tmp, > > d = Sort@Flatten@Outer[Subtract, a, b], > > c = ConstantArray[0, {2 n + 1}]}, > > For[os = 0; j = 1; i = -n, i <= n, i++; j++, > > s = Flatten@Position[Drop[d, os], _?(# >= -bsize + i &), 1, 1]; > > If[s == {}, Break[], > > os += s[[1]] - 1; > > tmp = Flatten@Position[Drop[d, os], _?(# > bsize + i &), 1, 1]; > > c[[j]] = If[tmp == {}, Length[d] - os, First@tmp - 1]]]; > > Return[c]] > > > > First@Timing@bindiff[a,b, bsize, n] is about 36 seconds. > > > > First@Timing@bindiff2[a, b, bsize, n] is about 3 seconds but still too > > slow and d uses up too much memory. > > > > The first thing that came to my mind was BinCounts and Partition. What > was the trouble you ran into when using them? > > bindiff3[a_, b_, bsize_, n_] := > With[{diffs = Flatten@Outer[Subtract, a, b]}, > Total /@ > Partition[BinCounts[diffs, {-bsize - n, bsize + n, 1}], 2 bsize, 1] > ] > > In[7]:= > Timing[r1 = bindiff1[a, b, bsize, n];] > Timing[r2 = bindiff2[a, b, bsize, n];] > Timing[r3 = bindiff3[a, b, bsize, n];] > > Out[7]= {43.7454, Null} > > Out[8]= {2.32665, Null} > > Out[9]= {0.479927, Null} > > In[10]:= r1 === r2 === r3 > Out[10]= True > > (I haven't measured memory use, but I am guessing that the critical > operation is Outer[], so the memory requirements of the three functions > are likely to be similar.) This can be sped up some more: bindiff4[a_, b_, bsize_, n_] := ListConvolve[ConstantArray[1, 2 bsize], BinCounts[Subtract @@ Transpose@Tuples@{a, b}, {-bsize - n, bsize + n}]] In[6]:= Timing[r3 = bindiff3[a, b, bsize, n];] Out[6]= {0.312, Null} In[7]:= Timing[r4 = bindiff4[a, b, bsize, n];] Out[7]= {0.11, Null} The advantages are that it doesn't unpack and that Subtract is called only once rather than repeatedly for each pair. Using Total[Partition[..., 2 bsize, 1], {2}] gives a comparable timing. Maxim Rytin m.r at inbox.ru