Re;Re; Speeding Up Indexing and Joining
- To: mathgroup at smc.vnet.net
- Subject: [mg52478] Re;[mg52453] Re; [mg52414]Speeding Up Indexing and Joining
- From: "Benedetto Bongiorno" <bbongiorno at attglobal.net>
- Date: Sun, 28 Nov 2004 01:07:27 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
Fellow MathGroup,
Here is a simple example.
Again on large mattices, it takes many many hours
Thank you
In[31]:=
a={{123,"bob",46,"fo"},{133,"harry",45,"fo"},{
165,"pete",44,"fo"},{2569,"moe",56,"fo"},{6589,"ben",69,"fo"}};
In[32]:=
b={{133,1,46,"go",6},{165,88,45,"mo",7},{
166,53,44,"do",9},{25,82,56,"ho",9},{6589,77,69,"xo",11},{6570,77,69,"
xo",11},{6571,77,69,"xo",11},{6572,77,69,"xo",11}};
ROUTINE
Create Index from Intersection of the first columns of matrix a and =
matrix b
firstCol=1;
lastCol =1;
aa = Transpose[Take[Transpose[a],{firstCol, lastCol}]];
bb = Transpose[Take[Transpose[b],{firstCol, lastCol}]];
idx=Intersection[aa,bb];
In[40]:=
idx=Intersection[aa,bb]
Out[40]=
{{133},{165},{6589}}
Select Records (Rows) from both matrix a and matrix b that equal the =
idx.
The idx consists of numerics
Locate Position Of Each Record In each matrix using idx
n=Length[idx]+1;
ans01={};
For[i=1,i<n,i++,
step1 = Position[aa,idx[[i]]];
AppendTo[ans01,step1]]
n=Length[idx]+1;
ans02={};
For[i=1,i<n,i++,
step1 = Position[bb,idx[[i]]];
AppendTo[ans02,step1]]
Extract a Records
ans01 =Extract[a,ans01];
ans02 =Extract[b,ans02];
Sort each matrix by Column 1 using a defined function
ans01=matsort[ans01,1]; ans02=matsort[ans02,1];
In[69]:=
final=RowJoin[ans02,ans01]
Out[69]= =
{{133,1,46,go,6,133,harry,45,fo},{165,88,45,mo,7,165,pete,44,fo},{
6589,77,69,xo,11,6589,ben,69,fo}}
-----Original Message-----
From: Tomas Garza [ <mailto:tgarza01 at prodigy.net.mx>
To: mathgroup at smc.vnet.net
mailto:tgarza01 at prodigy.net.mx]
Subject: [mg52478] [mg52453] Re: [mg52414] Speeding UP Indexing and Joining
ofDifferentSizeRectangular Matrixes
Perhaps something could be done. Please explain your problem in more detail.
Never mind your program (what is loc01?). What do you mean by joining rows?
Give a small example with small matrices (say 5 x 2 or something like that).
What are your present run times?
Tomas Garza
Mexico City
----- Original Message -----
From: "Benedetto Bongiorno" <bbongiorno at attglobal.net>
To: mathgroup at smc.vnet.net
Subject: [mg52478] [mg52453] [mg52414] Speeding UP Indexing and Joining of
DifferentSizeRectangular Matrixes
>
> Fellow MathGroup,
>
> I have been using Mathematica for financial analysis purposes and have
> been developing note book programs for about 5 years.
> My skills at this are self taught with help from Wolfram training and
> support.
> The largest challenge has been the speed in the analysis of large data
> sets.
> The following is an example of a routine that takes many hours.
> PLEASE HELP AND SHOW HOW I CAN IMPROVE THE ROUTINE TO MAKE THE RUN
TIME
> SHORTER.
>
> Equipment HP XP 3.24 processor 2 Gigs
> Mathematica 5.01
> Data set a= 257470 by 40, Mixed numeric and string fields, but each
> field
> (column) is either or numeric or string
> Data set b= 258705 by 5, All fields are numeric
>
> Objective: RowJoin the rows from each data set that have the same ID
> field in their corresponding column one.
>
> Thank you and Happy Holidays
>
> ROUTINE
> Create Index By Invoice ID
>
> firstCol=loc01[[1]];
>
> lastCol =loc01[[1]];
>
> aa = Transpose[Take[Transpose[a],{firstCol, lastCol}]];
>
> Length[aa]
>
> 257470
>
> firstCol=loc04[[1]];
>
> lastCol =loc04[[1]];
>
> bb = Transpose[Take[Transpose[b],{firstCol, lastCol}]];
>
> Length[bb]
>
> 258705
>
> idx=Intersection[aa,bb];
>
> Length[idx]
>
> 257249
>
> n=Length[idx]+1
>
> 257250
>
> Locate Position Of Each Record In aTable
>
> ans01={};
>
> For[i=1,i<n,i++,
>
> step1 = Position[aa,idx[[i]]];
>
> AppendTo[ans01,step1]]
>
> ans01=Flatten[ans01,1];
>
> Locate Position Of Each Record In bTable
>
> ans02={};
>
> For[i=1,i<n,i++,
>
> step1 = Position[bb,idx[[i]]];
>
> AppendTo[ans02,step1]]
>
> ans02=Flatten[ans02,1];
>
> Extract a Records by Index
>
> ans01 =Extract[currentBalance,ans01];
>
> Dimensions[ans01]
>
> Flatten If Not A Matrix
>
> If[MatrixQ[ans01],ans01=ans01,ans01=Flatten[ans01,1]];
>
> Dimensions[ans01]
>
> Extract b Records by Index
>
> ans02 =Extract[interestBalance,ans02];
>
> Dimensions[ans02]
>
> Flatten If Not A Matrix
>
> If[MatrixQ[ans02],ans02=ans02,ans02=Flatten[ans02,1]];
>
> Dimensions[ans02]
>
> ans01=matsort[ans01,loc01[[1]]];
>
> ans02=matsort[ans02,loc04[[1]]];
>
> noteTerms=RowJoin[ans02,ans01];
>
> Dimensions[noteTerms]
>
>
>
- Follow-Ups:
- Re: Re;Re; Speeding Up Indexing andJoining
- From: Tomas Garza <tgarza01@prodigy.net.mx>
- Re: Re;Re; Speeding Up Indexing andJoining