MathGroup Archive: September 2012 [00318]

[Date Index] [Thread Index] [Author Index]

Fast selection of lots of elements from a large list

To: mathgroup at smc.vnet.net
Subject: [mg128234] Fast selection of lots of elements from a large list
From: Mark Coleman <markspcoleman at gmail.com>
Date: Thu, 27 Sep 2012 22:48:27 -0400 (EDT)
Delivered-to: l-mathgroup@mail-archive0.wolfram.com
Delivered-to: l-mathgroup@wolfram.com
Delivered-to: mathgroup-newout@smc.vnet.net
Delivered-to: mathgroup-newsend@smc.vnet.net

Greetings,

I've been using Mathematica to perform cluster analysis on a data set with about 600,000 rows and 60 columns. I've had the FindCluster procedure return a unique row identifier (12 character string) rather than the clustered data because I want to "join" these results to another data set for further analysis. To accomplish this I've been using the Position function to identify the element numbers in each cluster.

To give a specific example, my cluster analysis identifiers twevle clusters on my original data set. The first of these clusters contains about 15,000 row identifiers. The extract the corresponding data from other data sets, I find the position of each identifier in my original data set using the simple code

q=clusterResults[[1]]; (* row id's for first cluster *)
p=Map[Position[rowIDs,#]&,q];

where, "rowIDs" are the first column from the other dataset that contain the same string identifiers (rowIDs has about 600,000 sublists). I then Extract these elements ("rows") from the data set and continue my analysis.

Unfortunately this is quite slow. Doing this on a sample of 1000 elements requires 340 seconds on my desktop computer. Some of my clusters have many tens of thousands of elements. I'm hoping someone can suggest a faster way of doing this.

Thanks,

Mark

Follow-Ups:
- Re: Fast selection of lots of elements from a large list
  - From: Sseziwa Mukasa <mukasa@gmail.com>

Prev by Date: Re: Crashing every other launch?

Next by Date: Re: Import multiple files with same extension from a folder

Previous by thread: Re: maximization with array of constraints

Next by thread: Re: Fast selection of lots of elements from a large list