Re: Speeding UP Indexing and Joining ofDifferentSizeRectangular Matrixes

• To: mathgroup at smc.vnet.net
• Subject: [mg52471] Re: Speeding UP Indexing and Joining ofDifferentSizeRectangular Matrixes
• From: David Bailey <dave at Remove_Thisdbailey.co.uk>
• Date: Sun, 28 Nov 2004 01:07:05 -0500 (EST)
• References: <co999c\$h67\$1@smc.vnet.net>
• Sender: owner-wri-mathgroup at wolfram.com

```Tomas Garza wrote:
> Perhaps something could be done. Please explain your problem in more detail.
> Never mind your program (what is loc01?). What do you mean by joining rows?
> Give a small example with small matrices (say 5 x 2 or something like that).
> What are your present run times?
> Tomas Garza
> Mexico City
> ----- Original Message -----
> From: "Benedetto Bongiorno" <bbongiorno at attglobal.net>
To: mathgroup at smc.vnet.net
> Subject: [mg52471]  Speeding UP Indexing and Joining of
> DifferentSizeRectangular Matrixes
>
>
>
>>Fellow MathGroup,
>>
>>I have been using Mathematica for financial analysis purposes and have
>>been
>>developing note book programs for about 5 years.
>>My skills at this are self taught with help from Wolfram training and
>>support.
>>The largest challenge has been the speed in the analysis of large data
>>sets.
>>The following is an example of a routine that takes many hours.
>>SHORTER.
>>
>>Equipment HP XP 3.24 processor 2 Gigs
>>Mathematica 5.01
>>Data set a= 257470 by 40, Mixed numeric and string fields, but each field
>>(column) is either or numeric or string
>>Data set b= 258705 by 5, All fields are numeric
>>
>>Objective:  RowJoin the rows from each data set that have the same ID
>>field
>>in their corresponding column one.
>>
>>Thank you and Happy Holidays
>>
>>ROUTINE
>>Create Index By Invoice ID
>>
>>firstCol=loc01[[1]];
>>
>>lastCol =loc01[[1]];
>>
>>aa = Transpose[Take[Transpose[a],{firstCol, lastCol}]];
>>
>>Length[aa]
>>
>>257470
>>
>>firstCol=loc04[[1]];
>>
>>lastCol =loc04[[1]];
>>
>>bb = Transpose[Take[Transpose[b],{firstCol, lastCol}]];
>>
>>Length[bb]
>>
>>258705
>>
>>idx=Intersection[aa,bb];
>>
>>Length[idx]
>>
>>257249
>>
>>n=Length[idx]+1
>>
>>257250
>>
>>Locate Position Of Each Record In aTable
>>
>>ans01={};
>>
>>For[i=1,i<n,i++,
>>
>>step1 = Position[aa,idx[[i]]];
>>
>>AppendTo[ans01,step1]]
>>
>>ans01=Flatten[ans01,1];
>>
>>Locate Position Of Each Record In bTable
>>
>>ans02={};
>>
>>For[i=1,i<n,i++,
>>
>>step1 = Position[bb,idx[[i]]];
>>
>>AppendTo[ans02,step1]]
>>
>>ans02=Flatten[ans02,1];
>>
>>Extract a Records by Index
>>
>>ans01 =Extract[currentBalance,ans01];
>>
>>Dimensions[ans01]
>>
>>Flatten If Not A Matrix
>>
>>If[MatrixQ[ans01],ans01=ans01,ans01=Flatten[ans01,1]];
>>
>>Dimensions[ans01]
>>
>>Extract b Records by Index
>>
>>ans02 =Extract[interestBalance,ans02];
>>
>>Dimensions[ans02]
>>
>>Flatten If Not A Matrix
>>
>>If[MatrixQ[ans02],ans02=ans02,ans02=Flatten[ans02,1]];
>>
>>Dimensions[ans02]
>>
>>ans01=matsort[ans01,loc01[[1]]];
>>
>>ans02=matsort[ans02,loc04[[1]]];
>>
>>noteTerms=RowJoin[ans02,ans01];
>>
>>Dimensions[noteTerms]
>>
>>
>>
>
>
>
Just a quick observation. Mixing floating point numbers with strings in
the same array is never a good idea in problems where performance
matters because it prevents the system creating packed arrays - which
can make a big difference.

David Bailey

```

• Prev by Date: Re: Combining graphics and tabels in one cell
• Next by Date: Re: Re: Re: Piecewise symbol in 5.1
• Previous by thread: Re: Speeding UP Indexing and Joining ofDifferentSizeRectangular Matrixes
• Next by thread: Changing CellMargins?