Re: A fast way to compare two vectors

• To: mathgroup at smc.vnet.net
• Subject: [mg121780] Re: A fast way to compare two vectors
• From: Ray Koopman <koopman at sfu.ca>
• Date: Sat, 1 Oct 2011 03:09:48 -0400 (EDT)
• Delivered-to: l-mathgroup@mail-archive0.wolfram.com
• References: <j5m44t\$ple\$1@smc.vnet.net> <j5mt8t\$sn8\$1@smc.vnet.net> <j63taq\$6g3\$1@smc.vnet.net>

```On Sep 30, 1:06 am, "Oleksandr Rasputinov"
<oleksandr_rasputi... at hmamail.com> wrote:
> On Tue, 27 Sep 2011 23:52:36 +0100, Ray Koopman <koop... at sfu.ca> wrote:
>> On Sep 26, 1:13 am, Yasha Gindikin <gindi... at gmail.com> wrote:
>>> Alas, there was a misprint in my code, I'm terribly sorry
>>> about that. The length of the intersection R should be >=n-2,
>>> where n is the length of the vector a[[p]]. Thank you so much
>>> for your realization of the poscom function, I'll explore the
>>> performance gain and report here.:)
>>
>> vcomb is a version of hyperfastVectorCompareBag
>> that always returns all the differences
>>
>> vcomb = Compile[{{v1, _Integer, 1}, {v2, _Integer, 1}},
>>   Block[{i1 = 1, i2 = 1,
>>          d1 = Internal`Bag@Most[{0}], d2 = Internal`Bag@Most[{0}]},
>>    (* Run along the lists, recording differences as we go *)
>>    While[i1 <= Length[v1] && i2 <= Length[v2],
>>          Which[v1[[i1]] < v2[[i2]], Internal`StuffBag[d1, i1]; i1++,
>>                v1[[i1]] > v2[[i2]], Internal`StuffBag[d2, i2]; i2++,
>>                True               , i1++; i2++                    ]];
>>    (* Fix up in case we ran off the end of one of the lists *)
>>    While[i1 <= Length[v1], Internal`StuffBag[d1, i1]; i1++];
>>    While[i2 <= Length[v2], Internal`StuffBag[d2, i2]; i2++];
>>    {Internal`BagPart[d1, All], Internal`BagPart[d2, All]} ] ] ;
>>
>> vkom is a merged version of two poskom's
>> that also returns all the differences
>>
>> vkom[a_,b_] := Block[{
>> r = SparseArray[ Automatic, {Max[a[[-1]],b[[-1]]]}, 0,
>>  {1, {{0, Length@a}, Transpose@{a}}, Range@Length@a} ],
>> s = SparseArray[ Automatic, {Max[a[[-1]],b[[-1]]]}, 0,
>>  {1, {{0, Length@b}, Transpose@{b}}, Range@Length@b} ]},
>> r[[b]] = ConstantArray[0,Length@b];
>> s[[a]] = ConstantArray[0,Length@a];
>> {r /. SparseArray[_,_,_,d_] :> d[[3]],
>>  s /. SparseArray[_,_,_,d_] :> d[[3]]}]
>>
>> This is one of the approximate break-even data configurations
>>
>> ab = Table[Sort@RandomSample[Range@200,100],{1*^4},{2}];
>>
>> AbsoluteTiming[u = vcomb @@@ ab;]
>>
>> {2.202807, Null}
>>
>> AbsoluteTiming[v = vkom @@@ ab;]
>>
>> {2.101078, Null}
>>
>> u === v
>>
>> True
>
> These timings seem to be system-dependent, as I observe vcomb to be about
> twice as fast as vkom for any length of list on Windows, with no apparent
> differences between Mathematica 5.2, 7.0.1, and 8.0.1. Nonetheless it may
> be the case that invoking a CompiledFunction incurs considerably more
> overhead on other platforms and so for relatively short vectors vkom would
> be faster as shown by your results. If we restrict ourselves to version 8,
> inputs of the particular form used here can be processed more quickly
> still by taking advantage of CompiledFunction auto-parallelization:
>
> vcomc = Compile[{{m, _Integer, 2}},
>      Block[{v1 = m[[1]], v2 = m[[2]],
>             i1 = 1, i2 = 1,
>             d1 = Internal`Bag@Most[{0}], d2 = Internal`Bag@Most[{0}]},
>       (* Run along the lists, recording differences as we go *)
>       While[i1 <= Length[v1] && i2 <= Length[v2],
>             Which[v1[[i1]] < v2[[i2]], Internal`StuffBag[d1, i1]; i1++,
>                   v1[[i1]] > v2[[i2]], Internal`StuffBag[d2, i2]; i2++,
>                   True               , i1++; i2++                    ]];
>       (* Fix up in case we ran off the end of one of the lists *)
>       While[i1 <= Length[v1], Internal`StuffBag[d1, i1]; i1++];
>       While[i2 <= Length[v2], Internal`StuffBag[d2, i2]; i2++];
>       {Internal`BagPart[d1, All], Internal`BagPart[d2, All]} ],
>      RuntimeAttributes -> {Listable} ] ;
>
> For the same definition of ab given above, on my system:
>
> AbsoluteTiming[u = vcomb @@@ ab;]
>
> {0.4531250, Null}
>
> AbsoluteTiming[v = vkom @@@ ab;]
>
> {0.8281250, Null}
>
> AbsoluteTiming[w = vcomc[ab];]
>
> {0.1406250, Null}
>
> u === v === w
>
> True

The times I gave were with v6 on a Mac G5. By "data configuration" I
meant the parameters of the data-generating process: two independent
vectors of length n = 100, with elements taken equiprobably without
replacement from 1 to M = 200. With n = 200, M = 400, I get

ab = Table[Sort@RandomSample[Range@400,200],{1*^4},{2}];

AbsoluteTiming[u = vcomb @@@ ab;]

{3.485911, Null}

AbsoluteTiming[v = Apply[vkom, ab, {1}]; ]

{2.400956, Null}

u === v

True

I don't know what range of values of n and M the OP is working with,
or if independence is reasonable in his problem -- with independence,
the expected proportion of matches is n/M -- or if the considerations
I mentioned in my Sep 28 post apply. In any case, the parallelization
speedup in vcomc is impressive and will be hard to beat.

```

• Prev by Date: Re: help with double integration
• Next by Date: Re: Re: ParallelDo and C-compiled routines
• Previous by thread: Re: MakeExpression and color
• Next by thread: Re: A fast way to compare two vectors