Fast selection of lots of elements from a large list
- To: mathgroup at smc.vnet.net
- Subject: [mg128234] Fast selection of lots of elements from a large list
- From: Mark Coleman <markspcoleman at gmail.com>
- Date: Thu, 27 Sep 2012 22:48:27 -0400 (EDT)
- Delivered-to: email@example.com
- Delivered-to: firstname.lastname@example.org
- Delivered-to: email@example.com
- Delivered-to: firstname.lastname@example.org
I've been using Mathematica to perform cluster analysis on a data set with about 600,000 rows and 60 columns. I've had the FindCluster procedure return a unique row identifier (12 character string) rather than the clustered data because I want to "join" these results to another data set for further analysis. To accomplish this I've been using the Position function to identify the element numbers in each cluster.
To give a specific example, my cluster analysis identifiers twevle clusters on my original data set. The first of these clusters contains about 15,000 row identifiers. The extract the corresponding data from other data sets, I find the position of each identifier in my original data set using the simple code
q=clusterResults[]; (* row id's for first cluster *)
where, "rowIDs" are the first column from the other dataset that contain the same string identifiers (rowIDs has about 600,000 sublists). I then Extract these elements ("rows") from the data set and continue my analysis.
Unfortunately this is quite slow. Doing this on a sample of 1000 elements requires 340 seconds on my desktop computer. Some of my clusters have many tens of thousands of elements. I'm hoping someone can suggest a faster way of doing this.
Prev by Date:
Re: Crashing every other launch?
Next by Date:
Re: Import multiple files with same extension from a folder
Previous by thread:
Re: maximization with array of constraints
Next by thread:
Re: Fast selection of lots of elements from a large list