Re: A fast way to compare two vectors
- To: mathgroup at smc.vnet.net
- Subject: [mg121689] Re: A fast way to compare two vectors
- From: "Oleksandr Rasputinov" <oleksandr_rasputinov at hmamail.com>
- Date: Mon, 26 Sep 2011 04:13:54 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <j5m44t$ple$1@smc.vnet.net> <j5mt5u$smm$1@smc.vnet.net>
On Sun, 25 Sep 2011 10:44:30 +0100, Oleksandr Rasputinov
<oleksandr_rasputinov at hmamail.com> wrote:
> On Sun, 25 Sep 2011 03:37:17 +0100, Yasha Gindikin <gindikin at gmail.com>
> wrote:
>
>> I'm looking for a hyper-fast way to compare two vectors using
>> Mathematica built-in functions. The problem arises in the calculation of
>> the Hamiltonian matrix elements of a many-particle quantum system.
>>
>> Consider two rows of equal length n, a[[p1] and a[[p2]]. Their elements
>> are integers from 1 to M, put in an increasing order, without repeats.
>> E.g., a[[p1]]={1,2,3,6,7,9} and a[[p2]]={1,2,4,7,8,9}. I need to find
>> the number of different elements (4 in the example) and, if this number
>> is less or equal to 4, their positions ((3,4) and (3,5) in the example).
>> The simplest way looks like this:
>>
>> diff = Intersection[a[[p1]], a[[p2]]];
>> R = Dimensions[diff][[1]];
>> If[R<5,d1 = Complement[a[[p1]], diff];
>> d2 = Complement[a[[p2]], diff];
>> k1 = Position[a[[p1]], d1[[1]]]; k2 = Position[a[[p1]], d1[[2]]];
>> k3 = Position[a[[p2]], d2[[1]]]; k4 = Position[a[[p2]], d2[[2]]]];
>>
>> This is slow, since four search operations are used (and because the
>> ordering of the elements is not taken advantage of). Is there a built-in
>> function that quickly finds the differences between the two vectors?
>>
>> Thank you.
>> Regards,
>> Yasha.
>>
>
> If you need higher performance still, AppendTo is the prime candidate for
> replacement since it copies the entire accumulated list of differences on
> every call. However, I doubt that this will present much of a limitation
> in practice unless you have very long vectors with a large number of
> differences.
>
In fact, having just read Daniel Lichtblau's description of the
undocumented internal functions for manipulating expression bags (see
http://stackoverflow.com/questions/6691491/implementing-a-quadtree-in-mathematica),
it is clear that bags are an ideal (and utterly trivial) substitute for
AppendTo in this case:
hyperfastVectorCompareBag = Compile[{
{vec1, _Integer, 1}, {vec2, _Integer, 1},
{MAXDIFFS, _Integer, 0}
},
Block[{
i1 = 1, i2 = 1,
diffct = 0,
diffs1 = Internal`Bag@Most[{0}], diffs2 = Internal`Bag@Most[{0}]
},
(* Run along the lists, recording differences as we go *)
While[
i1 <= Length[vec1] && i2 <= Length[vec2] && diffct < MAXDIFFS,
Which[
vec1[[i1]] == vec2[[i2]], i1++; i2++,
vec1[[i1]] < vec2[[i2]], diffct++; Internal`StuffBag[diffs1, i1]; i1++,
vec1[[i1]] > vec2[[i2]], diffct++; Internal`StuffBag[diffs2, i2]; i2++
]
];
(* Fix up in case we ran off the end of one of the lists *)
While[i1 <= Length[vec1] && diffct < MAXDIFFS,
diffct++; Internal`StuffBag[diffs1, i1]; i1++
];
While[i2 <= Length[vec2] && diffct < MAXDIFFS,
diffct++; Internal`StuffBag[diffs2, i2]; i2++
];
{Internal`BagPart[diffs1, All], Internal`BagPart[diffs2, All]}
]
];
This version is extremely fast. Using your test vectors,
vec1 = {1, 2, 3, 6, 7, 9};
vec2 = {1, 2, 4, 7, 8, 9};
but expanded a little (to give about 100,000 elements/400KB per vector)
using
Do[
vec1 = Join[vec1, vec1 + Max[vec1]];
vec2 = Join[vec2, vec2 + Max[vec2]],
{14}
];
we get:
Version Run time /s
vectorCompare 92.36
hyperfastVectorCompare 1.375
hyperfastVectorCompareBag 0.031
More gains are available by translating to C ("CompilationTarget" -> "C")
and compiling down into native code. Expanding the inputs a little more
(to over 200 million elements/800MB per vector), we obtain:
Version Run time /s
hyperfastVectorCompareBag (MVM) 55.59
hyperfastVectorCompareBag (C) 8.578
which in my opinion is not too bad at all considering that this code is
single-threaded and written wholly in the Mathematica language.