MathGroup Archive: May 2004 [00172]

[Date Index] [Thread Index] [Author Index]

Re: Eliminate duplicates using Union [] in a

To: mathgroup at smc.vnet.net
Subject: [mg48112] Re: [mg48063] Eliminate duplicates using Union [] in a
From: Tomas Garza <tgarza01 at prodigy.net.mx>
Date: Thu, 13 May 2004 00:09:02 -0400 (EDT)
References: <200405101051.GAA13867@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

I'm not sure as to how to use Union in your problem. I would proceed along
the following lines (off my hat). Suppose you have the list "records" where
the fields of interest are 4 and 5:

In[1]:=
records = {{a1, b1, c1, d1, e1, f1},
    {a2, b2, c2, d2, e2, f2}, {a3, b3, c3, d3, e3,
     f3}, {a4, b4, c4, d1, e1, f4},
    {a5, b5, c5, d3, e3, f5}};

In this example record #1 and record #4 have the same elements in field 4
and 5, and records #3 and #5 have the same (albeit different from the
previous ones) elements in fields 4 and 5. Now make a sublist with elements
4 and 5 of "records" and Sort and Split it to obtaind the pairs of
duplicates (it could be triplets or whatever):

In[2]:=
subs = ({#1[[4]], #1[[5]]} & ) /@ records
Out[2]=
{{d1, e1}, {d2, e2}, {d3, e3}, {d1, e1}, {d3, e3}}


In[3]:=
x = Split[Sort[subs]]
Out[3]=
{{{d1, e1}, {d1, e1}}, {{d2, e2}},
{{d3, e3}, {d3, e3}}}

Here we see that there are two subsets with duplicates:

In[4]:=
Length /@ x
Out[4]=
{2, 1, 2}

Now take the list of duplicates

In[5]:=
y = Select[x, Length[#1] >= 2 & ]
Out[5]=
{{{d1, e1}, {d1, e1}}, {{d3, e3}, {d3, e3}}}

and find the positions of the records which have duplicate pairs:

In[7]:=
Table[Position[records, {x1_, x2_, x3_, x4_, x5_,
     x6_} /; MemberQ[y[[j]], {x4, x5}]],
  {j, 1, Length[y]}]
Out[7]=
{{{1}, {4}}, {{3}, {5}}}

Now you can do whatever you want with the records with duplicate pairs (you
didn't specify whether you wanted to eliminate them altogether or just keep
one of them). If you want to keep just one of them, take

In[8]:=
#[[1]]&/@%
Out[8]=
{{1},{3}}

and delete the rest. Sorry if this looks too elaborate, but it works.

Tomas Garza
Mexico City

----- Original Message ----- 
From: "Mark Coleman" <mark at markscoleman.com>
To: mathgroup at smc.vnet.net
Subject: [mg48112] [mg48063] Eliminate duplicates using Union [] in a


> I know that if one applies the Union command to a list of values, it
> will return the list with duplicates removed. Can I use the Union
> command to eliminate duplicate when I have a list of data "records"?
> Specifically, I have a large data set that I read into a list on a
> row-oriented basis. Each element of the list consists of another list
> of about 30 different fields. Two of these fields are geographic
> indicators. If two or more records have identical values in these two
> fields, then the full records are considered duplicates and need to be
> identified and removed from the larger list (or at least marked for
> removal). Can Union do this?
>
> Thanks,
>
> -Mark
>
>

References:
- Eliminate duplicates using Union [] in a
  - From: Mark Coleman <mark@markscoleman.com>

Prev by Date: Re: Re: FindRoot cannot find obvious solution

Next by Date: distance between pairs of parallel lines, select two from list of length four, symbolically

Previous by thread: Re: Eliminate duplicates using Union [] in a

Next by thread: RE: Eliminate duplicates using Union [] in a