MathGroup Archive 2004

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Eliminate duplicates using Union [] in a

I'm not sure as to how to use Union in your problem. I would proceed along
the following lines (off my hat). Suppose you have the list "records" where
the fields of interest are 4 and 5:

records = {{a1, b1, c1, d1, e1, f1},
    {a2, b2, c2, d2, e2, f2}, {a3, b3, c3, d3, e3,
     f3}, {a4, b4, c4, d1, e1, f4},
    {a5, b5, c5, d3, e3, f5}};

In this example record #1 and record #4 have the same elements in field 4
and 5, and records #3 and #5 have the same (albeit different from the
previous ones) elements in fields 4 and 5. Now make a sublist with elements
4 and 5 of "records" and Sort and Split it to obtaind the pairs of
duplicates (it could be triplets or whatever):

subs = ({#1[[4]], #1[[5]]} & ) /@ records
{{d1, e1}, {d2, e2}, {d3, e3}, {d1, e1}, {d3, e3}}

x = Split[Sort[subs]]
{{{d1, e1}, {d1, e1}}, {{d2, e2}},
{{d3, e3}, {d3, e3}}}

Here we see that there are two subsets with duplicates:

Length /@ x
{2, 1, 2}

Now take the list of duplicates

y = Select[x, Length[#1] >= 2 & ]
{{{d1, e1}, {d1, e1}}, {{d3, e3}, {d3, e3}}}

and find the positions of the records which have duplicate pairs:

Table[Position[records, {x1_, x2_, x3_, x4_, x5_,
     x6_} /; MemberQ[y[[j]], {x4, x5}]],
  {j, 1, Length[y]}]
{{{1}, {4}}, {{3}, {5}}}

Now you can do whatever you want with the records with duplicate pairs (you
didn't specify whether you wanted to eliminate them altogether or just keep
one of them). If you want to keep just one of them, take


and delete the rest. Sorry if this looks too elaborate, but it works.

Tomas Garza
Mexico City

----- Original Message ----- 
From: "Mark Coleman" <mark at>
To: mathgroup at
Subject: [mg48112] [mg48063] Eliminate duplicates using Union [] in a

> I know that if one applies the Union command to a list of values, it
> will return the list with duplicates removed. Can I use the Union
> command to eliminate duplicate when I have a list of data "records"?
> Specifically, I have a large data set that I read into a list on a
> row-oriented basis. Each element of the list consists of another list
> of about 30 different fields. Two of these fields are geographic
> indicators. If two or more records have identical values in these two
> fields, then the full records are considered duplicates and need to be
> identified and removed from the larger list (or at least marked for
> removal). Can Union do this?
> Thanks,
> -Mark

  • Prev by Date: Re: Re: FindRoot cannot find obvious solution
  • Next by Date: distance between pairs of parallel lines, select two from list of length four, symbolically
  • Previous by thread: Re: Eliminate duplicates using Union [] in a
  • Next by thread: RE: Eliminate duplicates using Union [] in a