Re: Best method to break apart a data set
- To: mathgroup at smc.vnet.net
- Subject: [mg114063] Re: Best method to break apart a data set
- From: Daniel Lichtblau <danl at wolfram.com>
- Date: Mon, 22 Nov 2010 07:39:04 -0500 (EST)
----- Original Message ----- > From: "EliL" <elansey at gmail.com> > To: mathgroup at smc.vnet.net > Sent: Tuesday, November 16, 2010 4:05:49 AM > Subject: [mg113864] Best method to break apart a data set > For > stuff = {{0, 3290}, {0, 8576}, {0, 12081}, {4569828, 3336}, {4569828, > 8581}, {4569828, 12109}, {9139656, 3468}, {9139656, > 8600}, {9139656, 12193}, {13709484, 3671}, {13709484, > 8637}, {13709484, 12328}, {18279312, 3924}, {18279312, > 8698}, {18279312, 12513}, {22849141, 4205}, {22849141, > 8791}, {22849141, 12741}, {22849141, 15220}, {27418969, > 4494}, {27418969, 8925}, {27418969, 13009}, {27418969, > 15637}, {31988797, 4774}, {31988797, 9106}, {31988797, > 13312}, {31988797, 15995}, {36558625, 5032}, {36558625, > 9342}, {36558625, 13646}, {36558625, 16320}, {41128453, > 5259}, {41128453, 9633}, {41128453, 14008}, {45698281, > 5453}, {45698281, 9979}, {45698281, 14394}, {50268109, > 5612}, {50268109, 10377}, {50268109, 14802}, {54837937, > 5742}, {54837937, 10819}, {54837937, 15230}, {59407765, > 5846}, {59407765, 11298}, {59407765, 15675}, {63977593, > 5929}, {63977593, 11809}, {63977593, 16135}, {68547422, > 5995}, {68547422, 12345}, {73117250, 6048}, {73117250, > 12902}, {77687078, 6091}, {77687078, 13475}, {82256906, > 6125}, {82256906, 14062}, {86826734, 6153}, {86826734, > 14660}, {91396562, 6176}, {91396562, 15268}, {95966390, > 6195}, {95966390, 15884}, {100536218, 6210}, {105106046, > 6223}, {109675875, 6233}} > > If you ListPlot it you'll see 4 distinct curves. I'd love to have > Mathematica break them apart into four separate sets. I've tried > using FindClusters, but the default doesn't work. I've also tried > using DistanceFunction -> (Norm[#1[[2]] - #2[[2]]]^2 &) to pull only > the closeness in the y-axis. This successfully separates the bottom > curve, but doesn't break the 3 upper curves up in the right way. > > Any other ideas for how to break this up? Or different Distance > Functions to use? > Thanks so much, > Eli. With rescaling you can use Nearest to find sets of neighbors, in such a way that they do not jump components. I do not think this works in general, but it works for your set. You can then treat as a graph components problem, find connected components, and thus segment the point set. Here is code I used for this purpose. modstuff = stuff /. {x_, y_} :> {x, 10^4*y}; nf = Nearest[modstuff]; neighbors = Map[nf[#, 3] &, modstuff] /. {x_, y_} :> {x, y/10^4}; Needs["GraphUtilities`"]; graph = Flatten[ Map[{#[[1]] -> #[[2]], #[[2]] -> #[[3]]} &, neighbors]]; pieces = StrongComponents[graph]; (* could instead use WeakComponents *) Show[Map[ListPlot[#, Joined -> True] &, pieces]] Daniel Lichtblau Wolfram Research