Re: Speeding UP Indexing and Joining of Different Size Rectangular Matrixes

*To*: mathgroup at smc.vnet.net*Subject*: [mg52450] Re: Speeding UP Indexing and Joining of Different Size Rectangular Matrixes*From*: Bill Rowe <readnewsciv at earthlink.net>*Date*: Sat, 27 Nov 2004 01:41:18 -0500 (EST)*Sender*: owner-wri-mathgroup at wolfram.com

On 11/26/04 at 1:04 AM, bbongiorno at attglobal.net (Benedetto Bongiorno) wrote: >I have been using Mathematica for financial analysis purposes and >have been developing note book programs for about 5 years. My >skills at this are self taught with help from Wolfram training and >support. The largest challenge has been the speed in the analysis >of large data sets. The following is an example of a routine that >takes many hours. PLEASE HELP AND SHOW HOW I CAN IMPROVE THE >ROUTINE TO MAKE THE RUN TIME SHORTER. >Equipment HP XP 3.24 processor 2 Gigs Mathematica 5.01 Data set a= >257470 by 40, Mixed numeric and string fields, but each field >(column) is either or numeric or string Data set b= 258705 by 5, >All fields are numeric >Objective: RowJoin the rows from each data set that have the same >ID field in their corresponding column one. <snip code> In my experience, one of the keys to speeding up Mathematica is avoid usage of things like For and use functional programming instead. For example consider m=10000; data=Table[Random[],{m}]; sum=0; Timing[For[n=1,n<m+1,n++,sum+=data[[n]]]] {0.18 Second,Null} sum 4991.88 Timing[Plus@@data] {0.02 Second,4991.88} Both routines give the same result, but the functional method runs ~10X faster. As for solving your specific problem, I have a package I wrote for my use that does this much faster than what you describe above. In my case, I have data set up in matrices with the following format { {name1, name2, name3, name4 ... nameN}, {x1, x2, x3, x4 ..... xN}, .... {x1m, x2m x3m x4m .... xNM}} The first row consists of symbols with no value assigned to them. The remaining rows are all numeric or symbols with no values assigned. The function I developed for doing something similar to what you want has the syntax MergeData[dataset1, dataset 2, ... datasetN, name] where datasetN are the datasets to merge as you describe and name is the symbol used to designate which column has the common values. This is one of several functions I use to manipulate data sets. The package is written as an enhancement of the standard package Statistics`DataManipulation` I have thought off and on about submitting this package to Math Source, but have never gotten around to writting up documentation of each function. All I have done so far is include usage messages for each function which summarize the intended usage and show the required syntax. If you are interested contact me offline and I will send you a copy. Note, since I chose to use symbols with no assigned value for the non-numeric entries, my package may not work as is with your data. The usage of strings may cause problems for my code. I've never tried the code with strings. -- To reply via email subtract one hundred and four