Re: Speeding UP Indexing and Joining ofDifferentSizeRectangular Matrixes
- To: mathgroup at smc.vnet.net
- Subject: [mg52471] Re: Speeding UP Indexing and Joining ofDifferentSizeRectangular Matrixes
- From: David Bailey <dave at Remove_Thisdbailey.co.uk>
- Date: Sun, 28 Nov 2004 01:07:05 -0500 (EST)
- References: <co999c$h67$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Tomas Garza wrote: > Perhaps something could be done. Please explain your problem in more detail. > Never mind your program (what is loc01?). What do you mean by joining rows? > Give a small example with small matrices (say 5 x 2 or something like that). > What are your present run times? > Tomas Garza > Mexico City > ----- Original Message ----- > From: "Benedetto Bongiorno" <bbongiorno at attglobal.net> To: mathgroup at smc.vnet.net > Subject: [mg52471] Speeding UP Indexing and Joining of > DifferentSizeRectangular Matrixes > > > >>Fellow MathGroup, >> >>I have been using Mathematica for financial analysis purposes and have >>been >>developing note book programs for about 5 years. >>My skills at this are self taught with help from Wolfram training and >>support. >>The largest challenge has been the speed in the analysis of large data >>sets. >>The following is an example of a routine that takes many hours. >>PLEASE HELP AND SHOW HOW I CAN IMPROVE THE ROUTINE TO MAKE THE RUN TIME >>SHORTER. >> >>Equipment HP XP 3.24 processor 2 Gigs >>Mathematica 5.01 >>Data set a= 257470 by 40, Mixed numeric and string fields, but each field >>(column) is either or numeric or string >>Data set b= 258705 by 5, All fields are numeric >> >>Objective: RowJoin the rows from each data set that have the same ID >>field >>in their corresponding column one. >> >>Thank you and Happy Holidays >> >>ROUTINE >>Create Index By Invoice ID >> >>firstCol=loc01[[1]]; >> >>lastCol =loc01[[1]]; >> >>aa = Transpose[Take[Transpose[a],{firstCol, lastCol}]]; >> >>Length[aa] >> >>257470 >> >>firstCol=loc04[[1]]; >> >>lastCol =loc04[[1]]; >> >>bb = Transpose[Take[Transpose[b],{firstCol, lastCol}]]; >> >>Length[bb] >> >>258705 >> >>idx=Intersection[aa,bb]; >> >>Length[idx] >> >>257249 >> >>n=Length[idx]+1 >> >>257250 >> >>Locate Position Of Each Record In aTable >> >>ans01={}; >> >>For[i=1,i<n,i++, >> >>step1 = Position[aa,idx[[i]]]; >> >>AppendTo[ans01,step1]] >> >>ans01=Flatten[ans01,1]; >> >>Locate Position Of Each Record In bTable >> >>ans02={}; >> >>For[i=1,i<n,i++, >> >>step1 = Position[bb,idx[[i]]]; >> >>AppendTo[ans02,step1]] >> >>ans02=Flatten[ans02,1]; >> >>Extract a Records by Index >> >>ans01 =Extract[currentBalance,ans01]; >> >>Dimensions[ans01] >> >>Flatten If Not A Matrix >> >>If[MatrixQ[ans01],ans01=ans01,ans01=Flatten[ans01,1]]; >> >>Dimensions[ans01] >> >>Extract b Records by Index >> >>ans02 =Extract[interestBalance,ans02]; >> >>Dimensions[ans02] >> >>Flatten If Not A Matrix >> >>If[MatrixQ[ans02],ans02=ans02,ans02=Flatten[ans02,1]]; >> >>Dimensions[ans02] >> >>ans01=matsort[ans01,loc01[[1]]]; >> >>ans02=matsort[ans02,loc04[[1]]]; >> >>noteTerms=RowJoin[ans02,ans01]; >> >>Dimensions[noteTerms] >> >> >> > > > Just a quick observation. Mixing floating point numbers with strings in the same array is never a good idea in problems where performance matters because it prevents the system creating packed arrays - which can make a big difference. David Bailey