Re: A fast way to compare two vectors
- To: mathgroup at smc.vnet.net
- Subject: [mg121678] Re: A fast way to compare two vectors
- From: "Oleksandr Rasputinov" <oleksandr_rasputinov at hmamail.com>
- Date: Sun, 25 Sep 2011 05:42:35 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <j5m44t$ple$1@smc.vnet.net>
On Sun, 25 Sep 2011 03:37:17 +0100, Yasha Gindikin <gindikin at gmail.com>
wrote:
> I'm looking for a hyper-fast way to compare two vectors using
> Mathematica built-in functions. The problem arises in the calculation of
> the Hamiltonian matrix elements of a many-particle quantum system.
>
> Consider two rows of equal length n, a[[p1] and a[[p2]]. Their elements
> are integers from 1 to M, put in an increasing order, without repeats.
> E.g., a[[p1]]={1,2,3,6,7,9} and a[[p2]]={1,2,4,7,8,9}. I need to find
> the number of different elements (4 in the example) and, if this number
> is less or equal to 4, their positions ((3,4) and (3,5) in the example).
> The simplest way looks like this:
>
> diff = Intersection[a[[p1]], a[[p2]]];
> R = Dimensions[diff][[1]];
> If[R<5,d1 = Complement[a[[p1]], diff];
> d2 = Complement[a[[p2]], diff];
> k1 = Position[a[[p1]], d1[[1]]]; k2 = Position[a[[p1]], d1[[2]]];
> k3 = Position[a[[p2]], d2[[1]]]; k4 = Position[a[[p2]], d2[[2]]]];
>
> This is slow, since four search operations are used (and because the
> ordering of the elements is not taken advantage of). Is there a built-in
> function that quickly finds the differences between the two vectors?
>
> Thank you.
> Regards,
> Yasha.
>
No. But you can achieve high performance with a hand-coded implementation
specific to the problem at hand.
Firstly, a few operations in your given code are redundant. Removing these
and instituting a little good programming practice it becomes
vectorCompare = Function[
{vec1, vec2, MAXDIFFS},
Block[{
d1 = Complement[vec1, vec2], d2 = Complement[vec2, vec1],
k1, k2
},
If[Length[d1] + Length[d2] <= MAXDIFFS,
k1 = Flatten@Position[vec1, Alternatives @@ d1];
k2 = Flatten@Position[vec2, Alternatives @@ d2]
];
{k1, k2}
]
];
which is called as vectorCompare[a[[p1]], a[[p2]], 4], giving {{3, 4}, {3,
5}} as we expect.
But this is not where the big gains lie, since this problem can be
attacked very efficiently in procedural code. Such code can even be
compiled (from C, if necessary), in which case you will obtain the desired
high performance:
hyperfastVectorCompare = Compile[{
{vec1, _Integer, 1}, {vec2, _Integer, 1},
{MAXDIFFS, _Integer, 0}
},
Block[{
i1 = 1, i2 = 1,
diffct = 0,
diffs1 = Most[{0}], diffs2 = Most[{0}]
},
(* Run along the lists, recording differences as we go *)
While[
i1 <= Length[vec1] && i2 <= Length[vec2] && diffct < MAXDIFFS,
Which[
vec1[[i1]] == vec2[[i2]], i1++; i2++,
vec1[[i1]] < vec2[[i2]], diffct++; AppendTo[diffs1, i1]; i1++,
vec1[[i1]] > vec2[[i2]], diffct++; AppendTo[diffs2, i2]; i2++
]
];
(* Fix up in case we ran off the end of one of the lists *)
While[i1 <= Length[vec1] && diffct < MAXDIFFS,
diffct++; AppendTo[diffs1, i1]; i1++
];
While[i2 <= Length[vec2] && diffct < MAXDIFFS,
diffct++; AppendTo[diffs2, i2]; i2++
];
{diffs1, diffs2}
]
];
naturally, hyperfastVectorCompare[a[[p1]], a[[p2]], 4] also gives {{3, 4},
{3, 5}} for your inputs, but traverses the vectors only once and takes
advantage of the ordering of the elements. Note that if you choose
MAXDIFFS to be odd and less than the actual number of differences the
compiled code will throw an error due to {diffs1, diffs2} not being a
full-rank tensor; this can be avoided by passing diffs1 and diffs2 out
separately but I have not bothered with it here since requesting an odd
number of differences seemed to me inherently ill-defined and as such a
rather unlikely corner case. If you need the number of differences
explicitly, you can either pass this out separately as well or otherwise
just count the number of elements in the returned list of positions.
If you need higher performance still, AppendTo is the prime candidate for
replacement since it copies the entire accumulated list of differences on
every call. However, I doubt that this will present much of a limitation
in practice unless you have very long vectors with a large number of
differences.
Best,
O. R.