MathGroup Archive: November 2004 [00698]

[Date Index] [Thread Index] [Author Index]

Re: Speeding UP Indexing and Joining of Different Size Rectangular Matrixes

To: mathgroup at smc.vnet.net
Subject: [mg52450] Re: Speeding UP Indexing and Joining of Different Size Rectangular Matrixes
From: Bill Rowe <readnewsciv at earthlink.net>
Date: Sat, 27 Nov 2004 01:41:18 -0500 (EST)
Sender: owner-wri-mathgroup at wolfram.com

On 11/26/04 at 1:04 AM, bbongiorno at attglobal.net (Benedetto
Bongiorno) wrote:

>I have been using Mathematica for financial analysis purposes and
>have been developing note book programs for about 5 years. My
>skills at this are self taught with help from Wolfram training and
>support. The largest challenge has been the speed in the analysis
>of large data sets. The following is an example of a routine that
>takes many hours. PLEASE HELP AND SHOW HOW I CAN IMPROVE THE
>ROUTINE TO MAKE THE RUN TIME SHORTER.

>Equipment HP XP 3.24 processor 2 Gigs Mathematica 5.01 Data set a=
>257470 by 40, Mixed numeric and string fields, but each field
>(column) is either or numeric or string Data set b= 258705 by 5,
>All fields are numeric

>Objective:  RowJoin the rows from each data set that have the same
>ID field in their corresponding column one.

<snip code>

In my experience, one of the keys to speeding up Mathematica is avoid usage of things like For and use functional programming instead.

For example consider

m=10000;
data=Table[Random[],{m}];
sum=0;
Timing[For[n=1,n<m+1,n++,sum+=data[[n]]]]
{0.18 Second,Null}

sum
4991.88

Timing[Plus@@data]
{0.02 Second,4991.88}

Both routines give the same result, but the functional method runs ~10X faster.

As for solving your specific problem, I have a package I wrote for my use that does this much faster than what you describe above. In my case, I have data set up in matrices with the following format

{
{name1, name2, name3, name4 ... nameN},
{x1, x2, x3, x4 ..... xN},
 ....

{x1m, x2m x3m x4m .... xNM}}

The first row consists of symbols with no value assigned to them. The remaining rows are all numeric or symbols with no values assigned.

The function I developed for doing something similar to what you want has the syntax

MergeData[dataset1, dataset 2, ... datasetN, name] where datasetN are the datasets to merge as you describe and name is the symbol used to designate which column has the common values.

This is one of several functions I use to manipulate data sets. The package is written as an enhancement of the standard package Statistics`DataManipulation`

I have thought off and on about submitting this package to Math Source, but have never gotten around to writting up documentation of each function. All I have done so far is include usage messages for each function which summarize the intended usage and show the required syntax.

If you are interested contact me offline and I will send you a copy. Note, since I chose to use symbols with no assigned value for the non-numeric entries, my package may not work as is with your data. The usage of strings may cause problems for my code. I've never tried the code with strings.

--
To reply via email subtract one hundred and four

Prev by Date: Re: Speeding UP Indexing and Joining ofDifferentSizeRectangular Matrixes

Next by Date: Re: Re: Re: Re: Newly Released Mathematica 5.1 Delivers Unmatched Performance for Handling Data

Previous by thread: Re: Speeding UP Indexing and Joining of Different Size Rectangular Matrixes

Next by thread: Separating constants from nonconstants in an expression