Re: A fast way to compare two vectors
- To: mathgroup at smc.vnet.net
- Subject: [mg121678] Re: A fast way to compare two vectors
- From: "Oleksandr Rasputinov" <oleksandr_rasputinov at hmamail.com>
- Date: Sun, 25 Sep 2011 05:42:35 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <j5m44t$ple$1@smc.vnet.net>
On Sun, 25 Sep 2011 03:37:17 +0100, Yasha Gindikin <gindikin at gmail.com> wrote: > I'm looking for a hyper-fast way to compare two vectors using > Mathematica built-in functions. The problem arises in the calculation of > the Hamiltonian matrix elements of a many-particle quantum system. > > Consider two rows of equal length n, a[[p1] and a[[p2]]. Their elements > are integers from 1 to M, put in an increasing order, without repeats. > E.g., a[[p1]]={1,2,3,6,7,9} and a[[p2]]={1,2,4,7,8,9}. I need to find > the number of different elements (4 in the example) and, if this number > is less or equal to 4, their positions ((3,4) and (3,5) in the example). > The simplest way looks like this: > > diff = Intersection[a[[p1]], a[[p2]]]; > R = Dimensions[diff][[1]]; > If[R<5,d1 = Complement[a[[p1]], diff]; > d2 = Complement[a[[p2]], diff]; > k1 = Position[a[[p1]], d1[[1]]]; k2 = Position[a[[p1]], d1[[2]]]; > k3 = Position[a[[p2]], d2[[1]]]; k4 = Position[a[[p2]], d2[[2]]]]; > > This is slow, since four search operations are used (and because the > ordering of the elements is not taken advantage of). Is there a built-in > function that quickly finds the differences between the two vectors? > > Thank you. > Regards, > Yasha. > No. But you can achieve high performance with a hand-coded implementation specific to the problem at hand. Firstly, a few operations in your given code are redundant. Removing these and instituting a little good programming practice it becomes vectorCompare = Function[ {vec1, vec2, MAXDIFFS}, Block[{ d1 = Complement[vec1, vec2], d2 = Complement[vec2, vec1], k1, k2 }, If[Length[d1] + Length[d2] <= MAXDIFFS, k1 = Flatten@Position[vec1, Alternatives @@ d1]; k2 = Flatten@Position[vec2, Alternatives @@ d2] ]; {k1, k2} ] ]; which is called as vectorCompare[a[[p1]], a[[p2]], 4], giving {{3, 4}, {3, 5}} as we expect. But this is not where the big gains lie, since this problem can be attacked very efficiently in procedural code. Such code can even be compiled (from C, if necessary), in which case you will obtain the desired high performance: hyperfastVectorCompare = Compile[{ {vec1, _Integer, 1}, {vec2, _Integer, 1}, {MAXDIFFS, _Integer, 0} }, Block[{ i1 = 1, i2 = 1, diffct = 0, diffs1 = Most[{0}], diffs2 = Most[{0}] }, (* Run along the lists, recording differences as we go *) While[ i1 <= Length[vec1] && i2 <= Length[vec2] && diffct < MAXDIFFS, Which[ vec1[[i1]] == vec2[[i2]], i1++; i2++, vec1[[i1]] < vec2[[i2]], diffct++; AppendTo[diffs1, i1]; i1++, vec1[[i1]] > vec2[[i2]], diffct++; AppendTo[diffs2, i2]; i2++ ] ]; (* Fix up in case we ran off the end of one of the lists *) While[i1 <= Length[vec1] && diffct < MAXDIFFS, diffct++; AppendTo[diffs1, i1]; i1++ ]; While[i2 <= Length[vec2] && diffct < MAXDIFFS, diffct++; AppendTo[diffs2, i2]; i2++ ]; {diffs1, diffs2} ] ]; naturally, hyperfastVectorCompare[a[[p1]], a[[p2]], 4] also gives {{3, 4}, {3, 5}} for your inputs, but traverses the vectors only once and takes advantage of the ordering of the elements. Note that if you choose MAXDIFFS to be odd and less than the actual number of differences the compiled code will throw an error due to {diffs1, diffs2} not being a full-rank tensor; this can be avoided by passing diffs1 and diffs2 out separately but I have not bothered with it here since requesting an odd number of differences seemed to me inherently ill-defined and as such a rather unlikely corner case. If you need the number of differences explicitly, you can either pass this out separately as well or otherwise just count the number of elements in the returned list of positions. If you need higher performance still, AppendTo is the prime candidate for replacement since it copies the entire accumulated list of differences on every call. However, I doubt that this will present much of a limitation in practice unless you have very long vectors with a large number of differences. Best, O. R.