MathGroup Archive 2004

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Speeding UP Indexing and Joining of Different Size Rectangular Matrixes

  • To: mathgroup at
  • Subject: [mg52450] Re: Speeding UP Indexing and Joining of Different Size Rectangular Matrixes
  • From: Bill Rowe <readnewsciv at>
  • Date: Sat, 27 Nov 2004 01:41:18 -0500 (EST)
  • Sender: owner-wri-mathgroup at

On 11/26/04 at 1:04 AM, bbongiorno at (Benedetto
Bongiorno) wrote:

>I have been using Mathematica for financial analysis purposes and
>have been developing note book programs for about 5 years. My
>skills at this are self taught with help from Wolfram training and
>support. The largest challenge has been the speed in the analysis
>of large data sets. The following is an example of a routine that

>Equipment HP XP 3.24 processor 2 Gigs Mathematica 5.01 Data set a=
>257470 by 40, Mixed numeric and string fields, but each field
>(column) is either or numeric or string Data set b= 258705 by 5,
>All fields are numeric

>Objective:  RowJoin the rows from each data set that have the same
>ID field in their corresponding column one.

<snip code>

In my experience, one of the keys to speeding up Mathematica is avoid usage of things like For and use functional programming instead.

For example consider

{0.18 Second,Null}


{0.02 Second,4991.88}

Both routines give the same result, but the functional method runs ~10X faster.

As for solving your specific problem, I have a package I wrote for my use that does this much faster than what you describe above. In my case, I have data set up in matrices with the following format

{name1, name2, name3, name4 ... nameN},
{x1, x2, x3, x4 ..... xN},
{x1m, x2m x3m x4m .... xNM}}

The first row consists of symbols with no value assigned to them. The remaining rows are all numeric or symbols with no values assigned.

The function I developed for doing something similar to what you want has the syntax

MergeData[dataset1, dataset 2, ... datasetN, name] where datasetN are the datasets to merge as you describe and name is the symbol used to designate which column has the common values.

This is one of several functions I use to manipulate data sets. The package is written as an enhancement of the standard package Statistics`DataManipulation`

I have thought off and on about submitting this package to Math Source, but have never gotten around to writting up documentation of each function. All I have done so far is include usage messages for each function which summarize the intended usage and show the required syntax.

If you are interested contact me offline and I will send you a copy. Note, since I chose to use symbols with no assigned value for the non-numeric entries, my package may not work as is with your data. The usage of strings may cause problems for my code. I've never tried the code with strings.

To reply via email subtract one hundred and four

  • Prev by Date: Re: Speeding UP Indexing and Joining ofDifferentSizeRectangular Matrixes
  • Next by Date: Re: Re: Re: Re: Newly Released Mathematica 5.1 Delivers Unmatched Performance for Handling Data
  • Previous by thread: Re: Speeding UP Indexing and Joining of Different Size Rectangular Matrixes
  • Next by thread: Separating constants from nonconstants in an expression