MathGroup Archive 2009

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Definition of the similarity in a set of integers

  • To: mathgroup at smc.vnet.net
  • Subject: [mg96419] Re: Definition of the similarity in a set of integers
  • From: Jean-Marc Gulliet <jeanmarc.gulliet at gmail.com>
  • Date: Fri, 13 Feb 2009 03:43:12 -0500 (EST)
  • Organization: The Open University, Milton Keynes, UK
  • References: <gn11rc$8bq$1@smc.vnet.net>

In article <gn11rc$8bq$1 at smc.vnet.net>,
 Ryan Markley <overgeo at gmail.com> wrote:

> Hello I have two sets of integers eg
> 
> S1 = (25,14,32,45) and S2 = (26,12,31,48)
> 
> I want to define an operation similar to the variance that give me how
> similar both sets are, for example in the above example for both sets
> the results I have to get need to be similar because both sets are
> similar.
> 
> The problem with the variance is this
> 
> S1 = (25,1,1,1) and S2 = (1,1,25,1) these two sets have the same
> variance but they are completly different. What mathematical operation
> can I use to do what I am looking for.

Note that what you call "sets" are not sets as usually defined in 
mathematics: a collection of *distinct* objects. That is S1 = (25,1,1,1) 
as a set is {1, 25} and S2 = (1,1,25,1) as a set is {1, 25}, which 
clearly shows that both sets S1 and S2 are equal. OTOH, the sets S1 = 
{25,14,32,45} and S2 = {26,12,31,48} may be deemed as very dissimilar 
since they have no element in common. I think the objects you are 
dealing with can be described as vectors or ordered lists of integers.

Now, assuming you are comparing only vectors of equal length, you could 
use the correlation or the cosine distance, among many others available 
in Mathematica. See "Distance and Similarity Measures" at

http://reference.wolfram.com/mathematica/guide/DistanceAndSimilarityMeasu
res.html


For instance,

In[1]:= S1 = {25, 14, 32, 45};
S2 = {26, 12, 31, 48};

CorrelationDistance[S1, S2] // N
CosineDistance[S1, S2] // N

Out[3]= 0.00361843

Out[4]= 0.00152087

In[5]:= S1 = {25, 1, 1, 1};
S2 = {1, 1, 25, 1};

CorrelationDistance[S1, S2] // N
CosineDistance[S1, S2] // N

Out[7]= 1.33333

Out[8]= 0.917197

In[9]:= S1 = {24, 1};
S2 = {25, 2};

CorrelationDistance[S1, S2] // N
CosineDistance[S1, S2] // N

Out[11]= 0.

Out[12]= 0.00072905

In[13]:= S1 = {25, 1};
S2 = {1, 25};

CorrelationDistance[S1, S2] // N
CosineDistance[S1, S2] // N

Out[15]= 2.

Out[16]= 0.920128

Regards,
--Jean-Marc


  • Prev by Date: Re: linear regression with errors in both variables
  • Next by Date: Re: LabeledListPlot
  • Previous by thread: Re: Definition of the similarity in a set of integers
  • Next by thread: Re: User interface version 7: Alt Gr issue