Re: Speeding UP Indexing and Joining of Different Size Rectangular Matrixes
- To: mathgroup at smc.vnet.net
- Subject: [mg52450] Re: Speeding UP Indexing and Joining of Different Size Rectangular Matrixes
- From: Bill Rowe <readnewsciv at earthlink.net>
- Date: Sat, 27 Nov 2004 01:41:18 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
On 11/26/04 at 1:04 AM, bbongiorno at attglobal.net (Benedetto
Bongiorno) wrote:
>I have been using Mathematica for financial analysis purposes and
>have been developing note book programs for about 5 years. My
>skills at this are self taught with help from Wolfram training and
>support. The largest challenge has been the speed in the analysis
>of large data sets. The following is an example of a routine that
>takes many hours. PLEASE HELP AND SHOW HOW I CAN IMPROVE THE
>ROUTINE TO MAKE THE RUN TIME SHORTER.
>Equipment HP XP 3.24 processor 2 Gigs Mathematica 5.01 Data set a=
>257470 by 40, Mixed numeric and string fields, but each field
>(column) is either or numeric or string Data set b= 258705 by 5,
>All fields are numeric
>Objective: RowJoin the rows from each data set that have the same
>ID field in their corresponding column one.
<snip code>
In my experience, one of the keys to speeding up Mathematica is avoid usage of things like For and use functional programming instead.
For example consider
m=10000;
data=Table[Random[],{m}];
sum=0;
Timing[For[n=1,n<m+1,n++,sum+=data[[n]]]]
{0.18 Second,Null}
sum
4991.88
Timing[Plus@@data]
{0.02 Second,4991.88}
Both routines give the same result, but the functional method runs ~10X faster.
As for solving your specific problem, I have a package I wrote for my use that does this much faster than what you describe above. In my case, I have data set up in matrices with the following format
{
{name1, name2, name3, name4 ... nameN},
{x1, x2, x3, x4 ..... xN},
....
{x1m, x2m x3m x4m .... xNM}}
The first row consists of symbols with no value assigned to them. The remaining rows are all numeric or symbols with no values assigned.
The function I developed for doing something similar to what you want has the syntax
MergeData[dataset1, dataset 2, ... datasetN, name] where datasetN are the datasets to merge as you describe and name is the symbol used to designate which column has the common values.
This is one of several functions I use to manipulate data sets. The package is written as an enhancement of the standard package Statistics`DataManipulation`
I have thought off and on about submitting this package to Math Source, but have never gotten around to writting up documentation of each function. All I have done so far is include usage messages for each function which summarize the intended usage and show the required syntax.
If you are interested contact me offline and I will send you a copy. Note, since I chose to use symbols with no assigned value for the non-numeric entries, my package may not work as is with your data. The usage of strings may cause problems for my code. I've never tried the code with strings.
--
To reply via email subtract one hundred and four