Re: A fast way to compare two vectors

*To*: mathgroup at smc.vnet.net*Subject*: [mg121689] Re: A fast way to compare two vectors*From*: "Oleksandr Rasputinov" <oleksandr_rasputinov at hmamail.com>*Date*: Mon, 26 Sep 2011 04:13:54 -0400 (EDT)*Delivered-to*: l-mathgroup@mail-archive0.wolfram.com*References*: <j5m44t$ple$1@smc.vnet.net> <j5mt5u$smm$1@smc.vnet.net>

On Sun, 25 Sep 2011 10:44:30 +0100, Oleksandr Rasputinov <oleksandr_rasputinov at hmamail.com> wrote: > On Sun, 25 Sep 2011 03:37:17 +0100, Yasha Gindikin <gindikin at gmail.com> > wrote: > >> I'm looking for a hyper-fast way to compare two vectors using >> Mathematica built-in functions. The problem arises in the calculation of >> the Hamiltonian matrix elements of a many-particle quantum system. >> >> Consider two rows of equal length n, a[[p1] and a[[p2]]. Their elements >> are integers from 1 to M, put in an increasing order, without repeats. >> E.g., a[[p1]]={1,2,3,6,7,9} and a[[p2]]={1,2,4,7,8,9}. I need to find >> the number of different elements (4 in the example) and, if this number >> is less or equal to 4, their positions ((3,4) and (3,5) in the example). >> The simplest way looks like this: >> >> diff = Intersection[a[[p1]], a[[p2]]]; >> R = Dimensions[diff][[1]]; >> If[R<5,d1 = Complement[a[[p1]], diff]; >> d2 = Complement[a[[p2]], diff]; >> k1 = Position[a[[p1]], d1[[1]]]; k2 = Position[a[[p1]], d1[[2]]]; >> k3 = Position[a[[p2]], d2[[1]]]; k4 = Position[a[[p2]], d2[[2]]]]; >> >> This is slow, since four search operations are used (and because the >> ordering of the elements is not taken advantage of). Is there a built-in >> function that quickly finds the differences between the two vectors? >> >> Thank you. >> Regards, >> Yasha. >> > > If you need higher performance still, AppendTo is the prime candidate for > replacement since it copies the entire accumulated list of differences on > every call. However, I doubt that this will present much of a limitation > in practice unless you have very long vectors with a large number of > differences. > In fact, having just read Daniel Lichtblau's description of the undocumented internal functions for manipulating expression bags (see http://stackoverflow.com/questions/6691491/implementing-a-quadtree-in-mathematica), it is clear that bags are an ideal (and utterly trivial) substitute for AppendTo in this case: hyperfastVectorCompareBag = Compile[{ {vec1, _Integer, 1}, {vec2, _Integer, 1}, {MAXDIFFS, _Integer, 0} }, Block[{ i1 = 1, i2 = 1, diffct = 0, diffs1 = Internal`Bag@Most[{0}], diffs2 = Internal`Bag@Most[{0}] }, (* Run along the lists, recording differences as we go *) While[ i1 <= Length[vec1] && i2 <= Length[vec2] && diffct < MAXDIFFS, Which[ vec1[[i1]] == vec2[[i2]], i1++; i2++, vec1[[i1]] < vec2[[i2]], diffct++; Internal`StuffBag[diffs1, i1]; i1++, vec1[[i1]] > vec2[[i2]], diffct++; Internal`StuffBag[diffs2, i2]; i2++ ] ]; (* Fix up in case we ran off the end of one of the lists *) While[i1 <= Length[vec1] && diffct < MAXDIFFS, diffct++; Internal`StuffBag[diffs1, i1]; i1++ ]; While[i2 <= Length[vec2] && diffct < MAXDIFFS, diffct++; Internal`StuffBag[diffs2, i2]; i2++ ]; {Internal`BagPart[diffs1, All], Internal`BagPart[diffs2, All]} ] ]; This version is extremely fast. Using your test vectors, vec1 = {1, 2, 3, 6, 7, 9}; vec2 = {1, 2, 4, 7, 8, 9}; but expanded a little (to give about 100,000 elements/400KB per vector) using Do[ vec1 = Join[vec1, vec1 + Max[vec1]]; vec2 = Join[vec2, vec2 + Max[vec2]], {14} ]; we get: Version Run time /s vectorCompare 92.36 hyperfastVectorCompare 1.375 hyperfastVectorCompareBag 0.031 More gains are available by translating to C ("CompilationTarget" -> "C") and compiling down into native code. Expanding the inputs a little more (to over 200 million elements/800MB per vector), we obtain: Version Run time /s hyperfastVectorCompareBag (MVM) 55.59 hyperfastVectorCompareBag (C) 8.578 which in my opinion is not too bad at all considering that this code is single-threaded and written wholly in the Mathematica language.