Re: Eliminate duplicates using Union [] in a

*To*: mathgroup at smc.vnet.net*Subject*: [mg48112] Re: [mg48063] Eliminate duplicates using Union [] in a*From*: Tomas Garza <tgarza01 at prodigy.net.mx>*Date*: Thu, 13 May 2004 00:09:02 -0400 (EDT)*References*: <200405101051.GAA13867@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

I'm not sure as to how to use Union in your problem. I would proceed along the following lines (off my hat). Suppose you have the list "records" where the fields of interest are 4 and 5: In[1]:= records = {{a1, b1, c1, d1, e1, f1}, {a2, b2, c2, d2, e2, f2}, {a3, b3, c3, d3, e3, f3}, {a4, b4, c4, d1, e1, f4}, {a5, b5, c5, d3, e3, f5}}; In this example record #1 and record #4 have the same elements in field 4 and 5, and records #3 and #5 have the same (albeit different from the previous ones) elements in fields 4 and 5. Now make a sublist with elements 4 and 5 of "records" and Sort and Split it to obtaind the pairs of duplicates (it could be triplets or whatever): In[2]:= subs = ({#1[[4]], #1[[5]]} & ) /@ records Out[2]= {{d1, e1}, {d2, e2}, {d3, e3}, {d1, e1}, {d3, e3}} In[3]:= x = Split[Sort[subs]] Out[3]= {{{d1, e1}, {d1, e1}}, {{d2, e2}}, {{d3, e3}, {d3, e3}}} Here we see that there are two subsets with duplicates: In[4]:= Length /@ x Out[4]= {2, 1, 2} Now take the list of duplicates In[5]:= y = Select[x, Length[#1] >= 2 & ] Out[5]= {{{d1, e1}, {d1, e1}}, {{d3, e3}, {d3, e3}}} and find the positions of the records which have duplicate pairs: In[7]:= Table[Position[records, {x1_, x2_, x3_, x4_, x5_, x6_} /; MemberQ[y[[j]], {x4, x5}]], {j, 1, Length[y]}] Out[7]= {{{1}, {4}}, {{3}, {5}}} Now you can do whatever you want with the records with duplicate pairs (you didn't specify whether you wanted to eliminate them altogether or just keep one of them). If you want to keep just one of them, take In[8]:= #[[1]]&/@% Out[8]= {{1},{3}} and delete the rest. Sorry if this looks too elaborate, but it works. Tomas Garza Mexico City ----- Original Message ----- From: "Mark Coleman" <mark at markscoleman.com> To: mathgroup at smc.vnet.net Subject: [mg48112] [mg48063] Eliminate duplicates using Union [] in a > I know that if one applies the Union command to a list of values, it > will return the list with duplicates removed. Can I use the Union > command to eliminate duplicate when I have a list of data "records"? > Specifically, I have a large data set that I read into a list on a > row-oriented basis. Each element of the list consists of another list > of about 30 different fields. Two of these fields are geographic > indicators. If two or more records have identical values in these two > fields, then the full records are considered duplicates and need to be > identified and removed from the larger list (or at least marked for > removal). Can Union do this? > > Thanks, > > -Mark > >

**References**:**Eliminate duplicates using Union [] in a***From:*Mark Coleman <mark@markscoleman.com>