Re: Intersection and element counts
- To: mathgroup at smc.vnet.net
- Subject: [mg20677] Re: [mg20615] Intersection and element counts
- From: "Wolf, Hartmut" <hwolf at debis.com>
- Date: Sun, 7 Nov 1999 02:10:10 -0500
- Organization: debis Systemhaus
- References: <199911040713.CAA02652@smc.vnet.net.>
- Sender: owner-wri-mathgroup at wolfram.com
Arturas Acus schrieb: > > Dear Group > > I am interesting in intersecion, which takes into > account the number of the same elements. > > It is I would like > > myIntersection[{a,a,a,b,b,c},{a,a,b,b}] > > to give me {a,a,b,b}. > > Any solutions? > Do theoretical speed of this function is > very different of usual intersection algorithm speed? > Dear Arturas, I tried several ideas, which just came to my mind, and tested them. (This is of course not a definitive study, just the result my hacking yesterday evening, there will be more interesting ideas showing up in this usegroup - for shure!) l1 = {a, a, a, b, b, c}; l2 = {a, a, b, b}; As you did in your example, I assumed that both lists of (any expressions) are in "canonical order" The Functions considered: ------------------------- (1) we transform the lists to lists of {expr, count}, e.g. l1 -> {{a, 3}, {b, 2}, {c, 1}}; Intersect these, by using only the first component in SameTest and calculate the "intersection" count, then transform back to the original type of representation yourIntersection[1][l1_, l2_] := Module[{ul1, ul2, icount}, {ul1, ul2} = Map[{First[#], Length[#]} &, Split[#]] & /@ {l1, l2}; Flatten[Table[#1, {#2}] & @@@ Transpose[{First /@ (icount = {}; Intersection[ul1, ul2, SameTest -> ((If[First[#1] == First[#2], AppendTo[icount, Min[Last[#1], Last[#2]]]; True, False]) &)]), icount}], 1] ] Off[List::smtst] (* why this has to be done I don't know *) yourIntersection[1][l1, l2] {a, a, b, b} (2) we try to use built-in Intersection, by adjoining to each element its position in the repetition, so there ar no more identical elements, e.g. l1->{{{a, 1}, {a, 2}, {a, 3}}, {{b, 1}, {b, 2}}, {{c, 1}}}, Intersect and strip off the adornments. yourIntersection[2][l1_, l2_] := Module[{un1, un2}, {un1, un2} = Flatten[Transpose[{ #, Range[Length[#]]}] & /@ #, 1] & /@ Split /@ {l1, l2}; Transpose[Intersection[un1, un2]][[1]] ] yourIntersection[2][l1, l2] {a, a, b, b} (3) a variation of (2), we use a different method for making the elements singular, the idea is to avoid Split and such have one scan less yourIntersection[3][l1_, l2_] := Module[{un1, un2, cnt}, With[{f = If[First[#1] === #2, {#2, ++cnt}, {#2, cnt = 1}] &}, {un1, un2} = (cnt = 1; FoldList[f, {First[#], 1}, Rest[#] ] &) /@ {l1, l2}; Transpose[Intersection[un1, un2]][[1]] ]] yourIntersection[3][l1, l2] {a, a, b, b} (4) simple procedural merge (since input is in canonical order), use the function "Order" for testing yourIntersection[4][l1_, l2_] := With[{len1 = Length[l1], len2 = Length[l2]}, Module[{t1 = 1, t2 = 1, result = {}, obj}, While[t1 <= len1 && t2 <= len2, Switch[Order[(obj = l1[[t1]]), l2[[t2]]], 1, t1++, 0, (AppendTo[result, obj]; t1++; t2++), -1, t2++]]; result]] yourIntersection[4][l1, l2] {a, a, b, b} (5) another idea: get all elements in both lists (Union) and count the elements in both lists including zero count, e.g. l2 -> {2, 2, 0}, use simple count function; use these lists (minimum for each element) as directive to generate the result from the list of all objects anotherIntersection[l1_, l2_] := With[{objs = Union[l1, l2]}, With[{c1 = countFunction[objs, l1], c2 = countFunction[objs, l2]}, MapThread[Sequence @@ Table[#1, {Min[#2, #3]}] &, {objs, c1, c2}] ]] cntF1 = Function[{objs, inlist}, Count[inlist, #] & /@ objs]; yourIntersection[5][l1_, l2_] := Block[{countFunction = cntF1}, anotherIntersection[l1, l2]] yourIntersection[5][l1, l2] {a, a, b, b} (6) I suspect the count function, write my own cntF2 = Function[{objs, inlist}, With[{len0 = Length[objs], len1 = Length[inlist]}, Module[{t0 = 1, t1 = 1, result = {}, c = 0}, While[t0 <= len0, (While[t1 <= len1 && inlist[[t1]] === objs[[t0]], c++; t1++]; AppendTo[result, c]; c = 0; t0++)]; result]]]; yourIntersection[6][l1_, l2_] := Block[{countFunction = cntF2}, anotherIntersection[l1, l2]] yourIntersection[6][l1, l2] {a, a, b, b} (7) yet another idea: we Split both lists, then Join and Sort them, and look for pairs of list with identical elements, output the short of these. Otherwise we drag along the information for the next comparison step by wrapping with "no", finally delete these spurious data yourIntersection[7][l1_, l2_] := Module[{sjl, no}, With[{ff = If[First[#1] === First[#2], If[Length[#1] <= Length[#2], List @@ #1, #2], no @@ #2] &}, sjl = Sort[Join[Split[l1], Split[l2]], Order[First[#1], First[#2]] == 1 &]; Flatten[DeleteCases[FoldList[ff, no[Null], sjl], _no], 1] ]] yourIntersection[7][l1, l2] {a, a, b, b} The Tests: ---------- We test several orders of mangitude of the input, first n=100: {xl1, xl2} = Table[With[{n = 100}, MapThread[ Table[#1, {#2}] &, {Range[n], Table[Floor[6 Exp[-Random[Real, {0., 3.5}]]], {n}]}]] // Flatten, {2}]; {{1, 1, 1, 1, 2, 2, 2, 2, 3, 4, 7, 7, 7, 8, 8, 8, 8, 8, 10, 10, 10, 10, 13, 14, 14, 14, 14, 14, 16, 17, 18, 18, 18, 22, 22, 22, 26, 26, 26, 26, 27, 27, 27, 28, 28, 29, 29, 29, 30, 30, 30, 31, 31, 31, 31, 31, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 36, 38, 38, 38, 38, 39, 41, 44, 44, 44, 45, 47, 47, 47, 51, 51, 51, 51, 51, 52, 53, 55, 55, 55, 55, 55, 56, 56, 56, 57, 57, 58, 61, 62, 63, 65, 65, 65, 65, 65, 66, 71, 72, 72, 72, 72, 72, 75, 77, 78, 79, 81, 81, 81, 81, 82, 82, 82, 82, 82, 83, 87, 87, 87, 87, 90, 90, 91, 92, 95, 95, 95, 95, 95, 96, 97, 97, 97}, {2, 2, 2, 3, 3, 4, 4, 4, 4, 6, 6, 6, 6, 8, 8, 9, 9, 9, 10, 12, 14, 14, 14, 15, 15, 19, 19, 24, 24, 24, 24, 24, 27, 29, 29, 29, 29, 34, 34, 36, 37, 37, 38, 38, 38, 38, 40, 43, 43, 45, 45, 45, 49, 51, 51, 53, 54, 54, 56, 56, 56, 56, 56, 57, 61, 61, 61, 61, 63, 63, 64, 64, 65, 66, 69, 69, 71, 71, 76, 76, 77, 79, 79, 80, 80, 80, 81, 81, 81, 83, 85, 85, 85, 85, 85, 87, 87, 88, 88, 90, 91, 91, 93, 99, 99, 99, 99, 99}} (I had to play around with the constants (6 and 3.5) to get something "pleasant") Length /@ {xl1, xl2} {148, 108} Evaluate[Function[x, Hold[Timing[x[##]] ]] /@ Array[yourIntersection, 7]] &[ xl1, xl2] // ReleaseHold {{0.29 Second, -same as below-}, {0.02 Second, -same as below-}, {0.06 Second, -same as below-}, {0.05 Second, -same as below-}, {0.07 Second, -same as below-}, {0.11 Second, -same as below-}, {0.071 Second, {2, 2, 2, 3, 4, 8, 8, 10, 14, 14, 14, 27, 29, 29, 29, 34, 34, 36, 38, 38, 38, 38, 45, 51, 51, 53, 56, 56, 56, 57, 61, 63, 65, 66, 71, 77, 79, 81, 81, 81, 83, 87, 87, 90, 91}}} Length[%[[-1, -1]]] 45 to better compare, let's extract the Timing {0.28 Second, 0.03 Second, 0.04 Second, 0.05 Second, 0.07 Second, 0.11 Second, 0.071 Second} Next is n=1000: Length /@ ({xxl1, xxl2} = Table[With[{n = 1000}, -code as above-]) {1201, 1217} Evaluate[Function[x, Hold[Timing[x[##]][[1]] ]] /@ Array[yourIntersection, 7]] &[xxl1, xxl2] // ReleaseHold {26.949 Second, 0.26 Second, 0.361 Second, 0.581 Second, 0.56 Second, 1.483 Second, 0.941 Second} Finally n=10000: Length /@ ({xxxl1, xxxl2} = Table[With[{n = 1000}, -code as above-]) {12029, 11629} We eliminate the first function from the contest Evaluate[Function[x, Hold[Timing[x[##]][[1]] ]] /@ Drop[Array[yourIntersection, 7], 1]] &[xxxl1, xxxl2] // ReleaseHold {-not calculated-, 2.493 Second, 3.465 Second, 12.859 Second, 19.939 Second, 46.196 Second, 10.014 Second} A short discussion: ------------------- As often, the first idea was not the best one; #1 is worst and seemingly O[n^2]. But why? One might spot 'AppendTo[icount, Min[Last[#1], Last[#2]]]' as being the source of the imperformance and O[n^2] behaviour. Hoewever this is not true! I get quite the same timing if I replace that with icount={icount, Min[Last[#1], Last[#2]]} and Flatten the result (be cautious if you have general expressions instead of numbers in my test, as not to Flatten them too). My experience is, that AppendTo (now, in Version 4 ??) has become pretty fast if you don't access to the list in between repeated appends. (My impression is, Wolfram, they (now?) use delayed evaluation, so some past arguments have to be reconsidered.) I cannot spot the source for O[n^2] behaviour, Split, Map, Transpose, Flatten are clearly O[n], Intersection might be O[n log n] but *could* be O[n] for our input. Of course SameTest slows down Intersection, but should not switch to another class of complexity. What is really suspect is, although the correct result is being produced, I had to switch off spurious error messages: Off[List::smtst]. Am I doing something wrong? (if you can see that, please tell me!), or is there some bug in Intersection? The other algorithms which don't appear to be linear are #5 and #6 (the status of # 4 is not quite clear from the measurements). I attribute this to that Union operation, which should behave as O[n log n]. To my surprise #5 is better that #6, it countains Count[inlist, #] & /@ objs] as counting function. Well I think Wolfram did a good job for Count and used a similar idea than I did here, but they are somewhat nearer to the heart of the machine. (That should make Count of O[n log n] or even O[n] for our (canonically ordered) input.) I'm clearly disappointed from #4. I wrote that in order to have a linear algorithm for certain (ref. to my remarks concerning AppendTo above), obviously it's too *procedural*, too far off the machine language. The score of #7 however is better than I expected; the Sort inside slows it to O[n log n] *or* the internal Sort algorithm makes use of the partial ordering state after the Join of two ordered list (?). #2 proved to be the best one (only of *these* of course), but Split, Transpose, Flatten, Range and "naked" Intersection are of the best, and this came not very much to a surprise. To replace in #3 Range by *procedural* code in order to save one (suspected) scan through the data didn't pay off. And, of course you may compile one or the other function, but I'm leaving that for Carl ;-) Kind regards, Hartmut
- References:
- Intersection and element counts
- From: "Arturas Acus" <acus@itpa.lt>
- Intersection and element counts