Re: A fast way to compare two vectors
- To: mathgroup at smc.vnet.net
- Subject: [mg121780] Re: A fast way to compare two vectors
- From: Ray Koopman <koopman at sfu.ca>
- Date: Sat, 1 Oct 2011 03:09:48 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <j5m44t$ple$1@smc.vnet.net> <j5mt8t$sn8$1@smc.vnet.net> <j63taq$6g3$1@smc.vnet.net>
On Sep 30, 1:06 am, "Oleksandr Rasputinov"
<oleksandr_rasputi... at hmamail.com> wrote:
> On Tue, 27 Sep 2011 23:52:36 +0100, Ray Koopman <koop... at sfu.ca> wrote:
>> On Sep 26, 1:13 am, Yasha Gindikin <gindi... at gmail.com> wrote:
>>> Alas, there was a misprint in my code, I'm terribly sorry
>>> about that. The length of the intersection R should be >=n-2,
>>> where n is the length of the vector a[[p]]. Thank you so much
>>> for your realization of the poscom function, I'll explore the
>>> performance gain and report here.:)
>>
>> vcomb is a version of hyperfastVectorCompareBag
>> that always returns all the differences
>>
>> vcomb = Compile[{{v1, _Integer, 1}, {v2, _Integer, 1}},
>> Block[{i1 = 1, i2 = 1,
>> d1 = Internal`Bag@Most[{0}], d2 = Internal`Bag@Most[{0}]},
>> (* Run along the lists, recording differences as we go *)
>> While[i1 <= Length[v1] && i2 <= Length[v2],
>> Which[v1[[i1]] < v2[[i2]], Internal`StuffBag[d1, i1]; i1++,
>> v1[[i1]] > v2[[i2]], Internal`StuffBag[d2, i2]; i2++,
>> True , i1++; i2++ ]];
>> (* Fix up in case we ran off the end of one of the lists *)
>> While[i1 <= Length[v1], Internal`StuffBag[d1, i1]; i1++];
>> While[i2 <= Length[v2], Internal`StuffBag[d2, i2]; i2++];
>> {Internal`BagPart[d1, All], Internal`BagPart[d2, All]} ] ] ;
>>
>> vkom is a merged version of two poskom's
>> that also returns all the differences
>>
>> vkom[a_,b_] := Block[{
>> r = SparseArray[ Automatic, {Max[a[[-1]],b[[-1]]]}, 0,
>> {1, {{0, Length@a}, Transpose@{a}}, Range@Length@a} ],
>> s = SparseArray[ Automatic, {Max[a[[-1]],b[[-1]]]}, 0,
>> {1, {{0, Length@b}, Transpose@{b}}, Range@Length@b} ]},
>> r[[b]] = ConstantArray[0,Length@b];
>> s[[a]] = ConstantArray[0,Length@a];
>> {r /. SparseArray[_,_,_,d_] :> d[[3]],
>> s /. SparseArray[_,_,_,d_] :> d[[3]]}]
>>
>> This is one of the approximate break-even data configurations
>>
>> ab = Table[Sort@RandomSample[Range@200,100],{1*^4},{2}];
>>
>> AbsoluteTiming[u = vcomb @@@ ab;]
>>
>> {2.202807, Null}
>>
>> AbsoluteTiming[v = vkom @@@ ab;]
>>
>> {2.101078, Null}
>>
>> u === v
>>
>> True
>
> These timings seem to be system-dependent, as I observe vcomb to be about
> twice as fast as vkom for any length of list on Windows, with no apparent
> differences between Mathematica 5.2, 7.0.1, and 8.0.1. Nonetheless it may
> be the case that invoking a CompiledFunction incurs considerably more
> overhead on other platforms and so for relatively short vectors vkom would
> be faster as shown by your results. If we restrict ourselves to version 8,
> inputs of the particular form used here can be processed more quickly
> still by taking advantage of CompiledFunction auto-parallelization:
>
> vcomc = Compile[{{m, _Integer, 2}},
> Block[{v1 = m[[1]], v2 = m[[2]],
> i1 = 1, i2 = 1,
> d1 = Internal`Bag@Most[{0}], d2 = Internal`Bag@Most[{0}]},
> (* Run along the lists, recording differences as we go *)
> While[i1 <= Length[v1] && i2 <= Length[v2],
> Which[v1[[i1]] < v2[[i2]], Internal`StuffBag[d1, i1]; i1++,
> v1[[i1]] > v2[[i2]], Internal`StuffBag[d2, i2]; i2++,
> True , i1++; i2++ ]];
> (* Fix up in case we ran off the end of one of the lists *)
> While[i1 <= Length[v1], Internal`StuffBag[d1, i1]; i1++];
> While[i2 <= Length[v2], Internal`StuffBag[d2, i2]; i2++];
> {Internal`BagPart[d1, All], Internal`BagPart[d2, All]} ],
> RuntimeAttributes -> {Listable} ] ;
>
> For the same definition of ab given above, on my system:
>
> AbsoluteTiming[u = vcomb @@@ ab;]
>
> {0.4531250, Null}
>
> AbsoluteTiming[v = vkom @@@ ab;]
>
> {0.8281250, Null}
>
> AbsoluteTiming[w = vcomc[ab];]
>
> {0.1406250, Null}
>
> u === v === w
>
> True
The times I gave were with v6 on a Mac G5. By "data configuration" I
meant the parameters of the data-generating process: two independent
vectors of length n = 100, with elements taken equiprobably without
replacement from 1 to M = 200. With n = 200, M = 400, I get
ab = Table[Sort@RandomSample[Range@400,200],{1*^4},{2}];
AbsoluteTiming[u = vcomb @@@ ab;]
{3.485911, Null}
AbsoluteTiming[v = Apply[vkom, ab, {1}]; ]
{2.400956, Null}
u === v
True
I don't know what range of values of n and M the OP is working with,
or if independence is reasonable in his problem -- with independence,
the expected proportion of matches is n/M -- or if the considerations
I mentioned in my Sep 28 post apply. In any case, the parallelization
speedup in vcomc is impressive and will be hard to beat.