Re: Definition of the similarity in a set of integers

*To*: mathgroup at smc.vnet.net*Subject*: [mg96419] Re: Definition of the similarity in a set of integers*From*: Jean-Marc Gulliet <jeanmarc.gulliet at gmail.com>*Date*: Fri, 13 Feb 2009 03:43:12 -0500 (EST)*Organization*: The Open University, Milton Keynes, UK*References*: <gn11rc$8bq$1@smc.vnet.net>

In article <gn11rc$8bq$1 at smc.vnet.net>, Ryan Markley <overgeo at gmail.com> wrote: > Hello I have two sets of integers eg > > S1 = (25,14,32,45) and S2 = (26,12,31,48) > > I want to define an operation similar to the variance that give me how > similar both sets are, for example in the above example for both sets > the results I have to get need to be similar because both sets are > similar. > > The problem with the variance is this > > S1 = (25,1,1,1) and S2 = (1,1,25,1) these two sets have the same > variance but they are completly different. What mathematical operation > can I use to do what I am looking for. Note that what you call "sets" are not sets as usually defined in mathematics: a collection of *distinct* objects. That is S1 = (25,1,1,1) as a set is {1, 25} and S2 = (1,1,25,1) as a set is {1, 25}, which clearly shows that both sets S1 and S2 are equal. OTOH, the sets S1 = {25,14,32,45} and S2 = {26,12,31,48} may be deemed as very dissimilar since they have no element in common. I think the objects you are dealing with can be described as vectors or ordered lists of integers. Now, assuming you are comparing only vectors of equal length, you could use the correlation or the cosine distance, among many others available in Mathematica. See "Distance and Similarity Measures" at http://reference.wolfram.com/mathematica/guide/DistanceAndSimilarityMeasu res.html For instance, In[1]:= S1 = {25, 14, 32, 45}; S2 = {26, 12, 31, 48}; CorrelationDistance[S1, S2] // N CosineDistance[S1, S2] // N Out[3]= 0.00361843 Out[4]= 0.00152087 In[5]:= S1 = {25, 1, 1, 1}; S2 = {1, 1, 25, 1}; CorrelationDistance[S1, S2] // N CosineDistance[S1, S2] // N Out[7]= 1.33333 Out[8]= 0.917197 In[9]:= S1 = {24, 1}; S2 = {25, 2}; CorrelationDistance[S1, S2] // N CosineDistance[S1, S2] // N Out[11]= 0. Out[12]= 0.00072905 In[13]:= S1 = {25, 1}; S2 = {1, 25}; CorrelationDistance[S1, S2] // N CosineDistance[S1, S2] // N Out[15]= 2. Out[16]= 0.920128 Regards, --Jean-Marc