Re: A fast way to compare two vectors

*To*: mathgroup at smc.vnet.net*Subject*: [mg121780] Re: A fast way to compare two vectors*From*: Ray Koopman <koopman at sfu.ca>*Date*: Sat, 1 Oct 2011 03:09:48 -0400 (EDT)*Delivered-to*: l-mathgroup@mail-archive0.wolfram.com*References*: <j5m44t$ple$1@smc.vnet.net> <j5mt8t$sn8$1@smc.vnet.net> <j63taq$6g3$1@smc.vnet.net>

On Sep 30, 1:06 am, "Oleksandr Rasputinov" <oleksandr_rasputi... at hmamail.com> wrote: > On Tue, 27 Sep 2011 23:52:36 +0100, Ray Koopman <koop... at sfu.ca> wrote: >> On Sep 26, 1:13 am, Yasha Gindikin <gindi... at gmail.com> wrote: >>> Alas, there was a misprint in my code, I'm terribly sorry >>> about that. The length of the intersection R should be >=n-2, >>> where n is the length of the vector a[[p]]. Thank you so much >>> for your realization of the poscom function, I'll explore the >>> performance gain and report here.:) >> >> vcomb is a version of hyperfastVectorCompareBag >> that always returns all the differences >> >> vcomb = Compile[{{v1, _Integer, 1}, {v2, _Integer, 1}}, >> Block[{i1 = 1, i2 = 1, >> d1 = Internal`Bag@Most[{0}], d2 = Internal`Bag@Most[{0}]}, >> (* Run along the lists, recording differences as we go *) >> While[i1 <= Length[v1] && i2 <= Length[v2], >> Which[v1[[i1]] < v2[[i2]], Internal`StuffBag[d1, i1]; i1++, >> v1[[i1]] > v2[[i2]], Internal`StuffBag[d2, i2]; i2++, >> True , i1++; i2++ ]]; >> (* Fix up in case we ran off the end of one of the lists *) >> While[i1 <= Length[v1], Internal`StuffBag[d1, i1]; i1++]; >> While[i2 <= Length[v2], Internal`StuffBag[d2, i2]; i2++]; >> {Internal`BagPart[d1, All], Internal`BagPart[d2, All]} ] ] ; >> >> vkom is a merged version of two poskom's >> that also returns all the differences >> >> vkom[a_,b_] := Block[{ >> r = SparseArray[ Automatic, {Max[a[[-1]],b[[-1]]]}, 0, >> {1, {{0, Length@a}, Transpose@{a}}, Range@Length@a} ], >> s = SparseArray[ Automatic, {Max[a[[-1]],b[[-1]]]}, 0, >> {1, {{0, Length@b}, Transpose@{b}}, Range@Length@b} ]}, >> r[[b]] = ConstantArray[0,Length@b]; >> s[[a]] = ConstantArray[0,Length@a]; >> {r /. SparseArray[_,_,_,d_] :> d[[3]], >> s /. SparseArray[_,_,_,d_] :> d[[3]]}] >> >> This is one of the approximate break-even data configurations >> >> ab = Table[Sort@RandomSample[Range@200,100],{1*^4},{2}]; >> >> AbsoluteTiming[u = vcomb @@@ ab;] >> >> {2.202807, Null} >> >> AbsoluteTiming[v = vkom @@@ ab;] >> >> {2.101078, Null} >> >> u === v >> >> True > > These timings seem to be system-dependent, as I observe vcomb to be about > twice as fast as vkom for any length of list on Windows, with no apparent > differences between Mathematica 5.2, 7.0.1, and 8.0.1. Nonetheless it may > be the case that invoking a CompiledFunction incurs considerably more > overhead on other platforms and so for relatively short vectors vkom would > be faster as shown by your results. If we restrict ourselves to version 8, > inputs of the particular form used here can be processed more quickly > still by taking advantage of CompiledFunction auto-parallelization: > > vcomc = Compile[{{m, _Integer, 2}}, > Block[{v1 = m[[1]], v2 = m[[2]], > i1 = 1, i2 = 1, > d1 = Internal`Bag@Most[{0}], d2 = Internal`Bag@Most[{0}]}, > (* Run along the lists, recording differences as we go *) > While[i1 <= Length[v1] && i2 <= Length[v2], > Which[v1[[i1]] < v2[[i2]], Internal`StuffBag[d1, i1]; i1++, > v1[[i1]] > v2[[i2]], Internal`StuffBag[d2, i2]; i2++, > True , i1++; i2++ ]]; > (* Fix up in case we ran off the end of one of the lists *) > While[i1 <= Length[v1], Internal`StuffBag[d1, i1]; i1++]; > While[i2 <= Length[v2], Internal`StuffBag[d2, i2]; i2++]; > {Internal`BagPart[d1, All], Internal`BagPart[d2, All]} ], > RuntimeAttributes -> {Listable} ] ; > > For the same definition of ab given above, on my system: > > AbsoluteTiming[u = vcomb @@@ ab;] > > {0.4531250, Null} > > AbsoluteTiming[v = vkom @@@ ab;] > > {0.8281250, Null} > > AbsoluteTiming[w = vcomc[ab];] > > {0.1406250, Null} > > u === v === w > > True The times I gave were with v6 on a Mac G5. By "data configuration" I meant the parameters of the data-generating process: two independent vectors of length n = 100, with elements taken equiprobably without replacement from 1 to M = 200. With n = 200, M = 400, I get ab = Table[Sort@RandomSample[Range@400,200],{1*^4},{2}]; AbsoluteTiming[u = vcomb @@@ ab;] {3.485911, Null} AbsoluteTiming[v = Apply[vkom, ab, {1}]; ] {2.400956, Null} u === v True I don't know what range of values of n and M the OP is working with, or if independence is reasonable in his problem -- with independence, the expected proportion of matches is n/M -- or if the considerations I mentioned in my Sep 28 post apply. In any case, the parallelization speedup in vcomc is impressive and will be hard to beat.