Re: A fast way to compare two vectors

*To*: mathgroup at smc.vnet.net*Subject*: [mg121710] Re: A fast way to compare two vectors*From*: "Oleksandr Rasputinov" <oleksandr_rasputinov at hmamail.com>*Date*: Mon, 26 Sep 2011 20:06:13 -0400 (EDT)*Delivered-to*: l-mathgroup@mail-archive0.wolfram.com*References*: <j5m44t$ple$1@smc.vnet.net> <j5mt5u$smm$1@smc.vnet.net>

On Mon, 26 Sep 2011 09:19:18 +0100, Yasha Gindikin <gindikin at gmail.com> wrote: > Dear Oleksandr, thank you very much for your code. Actually, I was > trying to implement the same algorithm. My code was the following: > > Compare := Compile[{{a, _Integer, 1}, {b, _Integer, 1}}, > {s1 = s2 = 0; k = n; d1 = d2 = Table[0, {k, 3}]; > While[s1 < 3 && s2 < 3 && k - s1 > 0 && k - s2 > 0, > Which[a[[k - s1]] > b[[k - s2]], d1[[s1 + 1]] = a[[k - s1]]; s1++, > a[[k - s1]] < b[[k - s2]], d2[[s2 + 1]] = b[[k - s2]]; s2++, > a[[k - s1]] == b[[k - s2]], k--]]; Cm = Max[{s1, s2}]; > Which[Cm > 2, False, s1 - s2 == 1, d2[[s1]] = b[[1]], > s2 - s1 == 1, d1[[s2]] = a[[1]], s1 - s2 == 2, d2[[1]] = b[[2]]; > d2[[2]] = b[[1]], s2 - s1 == 2, d1[[1]] = a[[2]]; > d1[[2]] = a[[1]]]; Cm}]; > > But due to my very limited programming experience, the code turned out > to be not very efficient. It even works faster without the Compile > option, which puzzles me a lot. I will try to use your realization > instead. Thank you once again! > > Regards, > Yasha. > Dear Yasha The above example is slow for two reasons. The first is that SetDelayed (:=) is used instead of Set (=) in the definition. This means that the code is unnecessarily compiled afresh every time it is used. The second reason is that many of the values are not localised to the compiled code and such values are dealt with via Mathematica's interpreter. These problems can be addressed without changing the logic at all, simply by inserting a Block where local variable are defined, then passing the values of the localised variables out to the corresponding global variables at the end of the procedure: compare = Compile[{ {a, _Integer, 1}, {b, _Integer, 1} }, Block[{ s1 = 0, s2 = 0, k = Length[a], loc`Cm = 0, loc`d1 = Table[0, {3}], loc`d2 = Table[0, {3}] }, While[s1 < 3 && s2 < 3 && k - s1 > 0 && k - s2 > 0, Which[ a[[k - s1]] > b[[k - s2]], loc`d1[[s1 + 1]] = a[[k - s1]]; s1++, a[[k - s1]] < b[[k - s2]], loc`d2[[s2 + 1]] = b[[k - s2]]; s2++, a[[k - s1]] == b[[k - s2]], k-- ] ]; loc`Cm = Max[{s1, s2}]; Which[ loc`Cm > 2, False, s1 - s2 == 1, loc`d2[[s1]] = b[[1]], s2 - s1 == 1, loc`d1[[s2]] = a[[1]], s1 - s2 == 2, loc`d2[[1]] = b[[2]]; loc`d2[[2]] = b[[1]], s2 - s1 == 2, loc`d1[[1]] = a[[2]]; loc`d1[[2]] = a[[1]] ]; d1 = loc`d1; d2 = loc`d2; Cm = loc`Cm ] ]; Now, the code compiles fully and is quite fast. Unfortunately, it does not appear to give correct results: compare[{1, 2, 3, 6, 7, 9}, {1, 2, 4, 7, 8, 9}] returns Cm = 2 and sets d1 = {6, 3, 0} and d2 = {8, 4, 0}, whereas per your description I had understood that Cm = 4, d1 = {3, 4}, and d2 = {3, 5} were required. As I'm not sure whether the error lies in your procedure or in my understanding of your problem, I haven't attempted to change the logic of the above. If you need the function I gave previously to return values in global variables, you could use: compareCompiled = Compile[{ {vec1, _Integer, 1}, {vec2, _Integer, 1}, {MAXDIFFS, _Integer, 0} }, Block[{ i1 = 1, i2 = 1, diffct = 0, db1 = Internal`Bag@Most[{0}], db2 = Internal`Bag@Most[{0}], d1 = Most[{0}], d2 = Most[{0}] }, (* Run along the lists, recording differences as we go *) While[ i1 <= Length[vec1] && i2 <= Length[vec2] && diffct < MAXDIFFS, Which[ vec1[[i1]] == vec2[[i2]], i1++; i2++, vec1[[i1]] < vec2[[i2]], diffct++; Internal`StuffBag[db1, i1++], vec1[[i1]] > vec2[[i2]], diffct++; Internal`StuffBag[db2, i2++] ] ]; (* Fix up in case we ran off the end of one of the lists *) While[i1 <= Length[vec1] && diffct < MAXDIFFS, diffct++; Internal`StuffBag[db1, i1++]; ]; While[i2 <= Length[vec2] && diffct < MAXDIFFS, diffct++; Internal`StuffBag[db2, i2++]; ]; (* Convert bags to vectors *) d1 = Internal`BagPart[db1, All]; d2 = Internal`BagPart[db2, All]; (* Return values *) compare`d1 = d1; compare`d2 = d2; diffct ] ]; With[{cc = compareCompiled}, compare[vec1_, vec2_, MAXDIFFS_] := Block[{tmp`Cm, compare`d1, compare`d2}, If[(tmp`Cm = cc[vec1, vec2, MAXDIFFS + 1]) <= MAXDIFFS, d1 = compare`d1; d2 = compare`d2, d1 = d2 = {} ]; Cm = tmp`Cm ] ]; so that compare[{1, 2, 3, 6, 7, 9}, {1, 2, 4, 7, 8, 9}, 4] returns Cm = 4 and sets d1 = {3, 4} and d2 = {3, 5}, while compare[{1, 2, 3, 6, 7, 9}, {1, 2, 4, 7, 8, 9}, 3] returns Cm = 4 and sets d1 = d2 = {}. One aspect of the problem I am not very clear about is whether you require the total number of differences even when it exceeds the maximum number for which you need the positions. The above simply stops accumulating differences when the specified number is exceeded, so the maximum value of Cm you will see is MAXDIFFS+1. Of course, this is easily changed if necessary. I hope that these examples will be useful to you. Best, O. R.