Hierarchical clustering, Ward's linkage
- To: mathgroup at smc.vnet.net
- Subject: [mg117962] Hierarchical clustering, Ward's linkage
- From: Rachel Blakers <rachel.blakers at anu.edu.au>
- Date: Wed, 6 Apr 2011 05:12:08 -0400 (EDT)
Dear MathGroup, I am using Mathematica's hierarchical clustering package to analyse time series data. The method I am using is DirectAgglomerate with Ward's linkage: DirectAgglomerate[D, Linkage -> "Ward"] where D is a distance matrix calculated from the data. I have a couple of questions that I hope the group can help me with: 1) First, does Mathematica use the same algorithm as is described in Ward's 1963 paper? (Ward, "Hierarchical Grouping to Optimize an Objective Function", Journal of the American Statistical Association, Vol. 58, No. 301 (Mar., 1963), pp. 236-244). Specifically, how does Mathematica modify Ward's objective function for use with the distance matrix D rather than individual observations? In his paper, Ward presents the following objective function for univariate data: ESS = ||X ? meanX|| = Sum(xi^2) ? 1/n*(Sum(xi))^2 where X is a vector of observations: x1, x2, ?, xn. 2) Second, should the distance matrix D be generated using Euclidean distance or is any distance metric appropriate? I am currently using Manhattan distance. I am new to MathGroup so I hope that my post is appropriate. Thank you very much for your help. Rachel