MathGroup Archive 2006

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Filtering same elements from two lists

  • To: mathgroup at smc.vnet.net
  • Subject: [mg67965] Re: Filtering same elements from two lists
  • From: Rolf.Mertig at gmail.com
  • Date: Wed, 19 Jul 2006 05:21:06 -0400 (EDT)
  • References: <e9ibpm$51e$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

The Split-based solutions are not quite correct.
Also, using Dispatch it is possible to be even quicker than the nice
Reap/Sow
based solution.

Rolf Mertig
GluonVision GmbH, Berlin, Gernany
----------------------------------------------------------------------------

In[1]:= !!f.m
test1 = {{PLUS1, 55, 43}, {PLUS2, 555, 83}};
test2 = {{PLUS1, 77, 99}, {PLUS2, 52, 103}};
Print["simple check without Dispatch: ", test1 /. Function[{x,y,z},
{x,a_Integer,b_Integer}:>{x,a,b,y,z}]@@@test2];
(* create two lists of 10^n elements *)

<<DiscreteMath`Combinatorica` ;

Format[LineBreak[_]]:= "";
For[n = 1, n < 5, n=n+3,      Print[" n = ", n];
{r1n, s1n} = {RandomPermutation[10^n], RandomPermutation[10^n]};
list1 = Function[{x,y,z}, {Symbol["PLUS"<>ToString[x]],y,z}]@@@Table[
            {r1n[[i]], Random[Integer,{1,10^(n+1)}],
Random[Integer,{1,10^(n+1)}]},{i,10^n}];
list2 = Function[{x,y,z}, {Symbol["PLUS"<>ToString[x]],y,z}]@@@Table[
            {s1n[[j]], Random[Integer,{1,10^(n+1)}],
Random[Integer,{1,10^(n+1)}]},{j,10^n}];
(* using Dispatch is quite quick: *)
Print["t2 timing ", t2 = Timing[( r2 = list1/.
Dispatch[Function[{x,y,z}, x:> Sequence[x,y,z]]@@@list2] /.
                    {p_Symbol,x_Integer,y_,a_,b_}:>{p,a,b,x,y} );]];
(* if the order is irrelevant and there are the same amount of PLUS',
then: *)
Print["t3 timing ", t3 = Timing[( r3 = list2/.
Dispatch[Function[{x,y,z}, x:> Sequence[x,y,z]]@@@list1]);]];

(* erroneous Split-based solution:  Jens-Peer Kuska *)
Print["t4 timing ", t4 = Timing[( r4 =
Prepend[Join @@ (Rest /@ # ), #[[1, 1]]] & /@
Split[Sort[Join[list1, list2]], First[#1] ===
First[#2] &]
);]];

(* erroneous Split-based solution:  Peter Pein  *)
Print["t5 timing ", t5 = Timing[ ( r5 = (Flatten[{#1[[1,1]], Rest /@
#1}] & ) /@
                    Select[Split[Sort[Union[list1, list2]], #1[[1]] ===
#2[[1]] & ], Length[#1] > 1 & ] ) ;]];

(* correct Reap/Sow-based solution:  dkr *)
Print["t6 timing ", t6 = Timing[( r6 =
Reap[Sow[Rest@#,First@#]&/@
      Join[list1,list2],_,{#1,Sequence@@Flatten[#2]}&][[2]]
 );]];


(* show that something is not right with the Split-based solutions :
print things for n = 1: *)

If[n === 1,
   Print["list1 sorted = ", list1 // Sort ];
   Print["list2 sorted = ", list2 // Sort ];
   Print["r2 sorted = ", r2 // Sort ];
   Print["r4 sorted = ", r4 // Sort ];
   Print["r6 sorted = ", r6 // Sort ];
  ];

Print["check r2 with r3 ", Sort[r2] === Sort[r3]];
Print["check r2 with r4 ", Sort[r2] === Sort[r4]];
Print["check r2 with r5 ", Sort[r2] === Sort[r5]];
Print["check r2 with r6 ", Sort[r2] === Sort[r6]];
(* end For - loop *)]



In[1]:= <<f.m
simple check without Dispatch: {{PLUS1, 55, 43, 77, 99},

>    {PLUS2, 555, 83, 52, 103}}
 n = 1
t2 timing {0. Second, Null}
t3 timing {0. Second, Null}
t4 timing {0. Second, Null}
t5 timing {0. Second, Null}
t6 timing {0. Second, Null}
list1 sorted = {{PLUS1, 52, 100}, {PLUS10, 3, 53}, {PLUS2, 71, 17},
>    {PLUS3, 50, 63}, {PLUS4, 15, 18}, {PLUS5, 16, 17}, {PLUS6, 60, 63},
>    {PLUS7, 36, 67}, {PLUS8, 58, 35}, {PLUS9, 83, 90}}
list2 sorted = {{PLUS1, 22, 80}, {PLUS10, 49, 32}, {PLUS2, 67, 80},
>    {PLUS3, 64, 61}, {PLUS4, 69, 46}, {PLUS5, 75, 24}, {PLUS6, 80, 37},
>    {PLUS7, 52, 76}, {PLUS8, 33, 19}, {PLUS9, 92, 37}}
r2 sorted = {{PLUS1, 52, 100, 22, 80}, {PLUS10, 3, 53, 49, 32},
>    {PLUS2, 71, 17, 67, 80}, {PLUS3, 50, 63, 64, 61},
>    {PLUS4, 15, 18, 69, 46}, {PLUS5, 16, 17, 75, 24},
>    {PLUS6, 60, 63, 80, 37}, {PLUS7, 36, 67, 52, 76},
>    {PLUS8, 58, 35, 33, 19}, {PLUS9, 83, 90, 92, 37}}
r4 sorted = {{PLUS1, 22, 80, 52, 100}, {PLUS10, 3, 53, 49, 32},
>    {PLUS2, 67, 80, 71, 17}, {PLUS3, 50, 63, 64, 61},
>    {PLUS4, 15, 18, 69, 46}, {PLUS5, 16, 17, 75, 24},
>    {PLUS6, 60, 63, 80, 37}, {PLUS7, 36, 67, 52, 76},
>    {PLUS8, 33, 19, 58, 35}, {PLUS9, 83, 90, 92, 37}}
r6 sorted = {{PLUS1, 52, 100, 22, 80}, {PLUS10, 3, 53, 49, 32},
>    {PLUS2, 71, 17, 67, 80}, {PLUS3, 50, 63, 64, 61},
>    {PLUS4, 15, 18, 69, 46}, {PLUS5, 16, 17, 75, 24},
>    {PLUS6, 60, 63, 80, 37}, {PLUS7, 36, 67, 52, 76},
>    {PLUS8, 58, 35, 33, 19}, {PLUS9, 83, 90, 92, 37}}
check r2 with r3 True
check r2 with r4 False
check r2 with r5 False
check r2 with r6 True
 n = 4
t2 timing {0.140009 Second, Null}
t3 timing {0.128008 Second, Null}
t4 timing {0.236015 Second, Null}
t5 timing {0.264017 Second, Null}
t6 timing {0.244014 Second, Null}
check r2 with r3 True
check r2 with r4 False
check r2 with r5 False
check r2 with r6 True


  • Prev by Date: Re: Why Does Repeated Dot Product Take So Long?
  • Next by Date: line thickness 2D plot legend
  • Previous by thread: Re: Filtering same elements from two lists
  • Next by thread: Clustering as constraint solving