MathGroup Archive 2004

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Re;Re; Speeding Up Indexing andJoining

  • To: mathgroup at smc.vnet.net
  • Subject: [mg52514] Re: [mg52478] Re;[mg52453] Re; [mg52414]Speeding Up Indexing andJoining
  • From: Tomas Garza <tgarza01 at prodigy.net.mx>
  • Date: Tue, 30 Nov 2004 05:24:33 -0500 (EST)
  • References: <200411280607.BAA06702@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

I think you can get a very substantial reduction in your run-time, down from 
"many, many hours" to less than a minute, perhaps. I borrowed the main idea 
from an example in the Help browser under Split (see the function defined 
therein, called "Classify"). This function, which I use in a very restricted 
sense, takes two lists and constructs disjoint sets of members of both lists 
which share some condition. In your case, it takes the joint list Join[a, b] 
and selects those members which have the same value in the first element. It 
goes on building these sets. In some cases, there will only be one element 
(where there is no element in b with the same first element as in a); in 
other cases, there will be two (when it finds an element in b which has the 
same value in the first position as an element in a). All you have to do 
then is to Select those cases with two elements, and you obtain your desired 
objective.
The function Classify is defined as

In[1]:=
Classify[s_,f_:Identity]:=
  Map[Last,Split[Sort[{f[#],#}&/@s],SameQ[First[#1],First[#2]]&],{2}]

Try to understand what's going on - it is really worth your while. Then, 
applying it to your example,

In[2]:=
a = {{123, "bob", 46, "fo"}, {133, "harry", 45, "fo"}, {165, "pete", 44,
"fo"}, {2569, "moe", 56, "fo"}, {6589, "ben", 69, "fo"}};


b = {{133, 1, 46, "go", 6}, {165, 88, 45, "mo", 7}, {166, 53, 44, "do",
9}, {25, 82, 56, "ho", 9}, {6589, 77, 69, "xo", 11}, {6570, 77, 69,
"xo", 11}, {6571, 77, 69, "xo", 11}, {6572, 77, 69, "x0", 11}};


In[4]:=
pp = Join[b, a];


In[5]:=
gg[s_] := s[[{1}]]
t = Classify[pp, gg]


Out[6]=
{{{25, 82, 56, ho, 9}}, {{123, bob, 46, fo}}, {{133, harry, 45, fo}, {133, 
1,
46, go, 6}}, {{165, pete, 44, fo}, {165, 88, 45, mo, 7}}, {{166, 53, 44,
do, 9}}, {{2569, moe, 56, fo}}, {{6570, 77, 69, xo, 11}}, {{6571, 77,
69, xo, 11}}, {{6572, 77, 69, x0, 11}}, {{6589, ben, 69, fo}, {6589, 77,
69, xo, 11}}}

Then Select those lists which have two elements, and join the two elements:

In[7]:=
Join[#[[2]],#[[1]]]&/@(Select[t,Length[#]>1&])


Out[7]=
{{133,1,46,go,6,133,harry,45,fo},{165,88,45,mo,7,165,pete,44,fo},{6589,77,69,
    xo,11,6589,ben,69,fo}}

which is the result you were after. I have applied this to a problem where a 
has over 280,000 records, each with some 40 elements, and b has around 
40,000 records, each with 5 or 6 elements. I obtained the result in about 20 
seconds. Notice that I avoid the use of the term "matrices"; it is quite 
unnecesary, since your lists inside a and b can have differents lengths, and 
their elements may not be numbers. There is nothing in the problem that has 
anything to do with concepts in matrix theory. We use mere lists...

This is a beautiful example of the use of functional programming. It also 
illustrates the benefits of using the Help browser and studying the examples 
therein.

Tomas Garza
Mexico City
----- Original Message ----- 
From: "Benedetto Bongiorno" <bbongiorno at attglobal.net>
To: mathgroup at smc.vnet.net
Subject: [mg52514] [mg52478] Re;[mg52453] Re; [mg52414]Speeding Up Indexing and 
Joining


> Fellow MathGroup,
>
> Here is a simple example.
>
> Again on large mattices, it takes many many hours
>
> Thank you
>
>
>
> In[31]:=
>
> a={{123,"bob",46,"fo"},{133,"harry",45,"fo"},{
>
> 165,"pete",44,"fo"},{2569,"moe",56,"fo"},{6589,"ben",69,"fo"}};
>
> In[32]:=
>
> b={{133,1,46,"go",6},{165,88,45,"mo",7},{
>
> 166,53,44,"do",9},{25,82,56,"ho",9},{6589,77,69,"xo",11},{6570,77,69,"
>
> xo",11},{6571,77,69,"xo",11},{6572,77,69,"xo",11}};
>
> ROUTINE
>
> Create Index from Intersection of the first columns of matrix a and =
> matrix b
>
> firstCol=1;
>
> lastCol =1;
>
> aa = Transpose[Take[Transpose[a],{firstCol, lastCol}]];
>
> bb = Transpose[Take[Transpose[b],{firstCol, lastCol}]];
> idx=Intersection[aa,bb];
>
> In[40]:=
>
> idx=Intersection[aa,bb]
>
> Out[40]=
>
> {{133},{165},{6589}}
>
> Select Records (Rows) from both matrix a and matrix b that equal the =
> idx.
> The idx consists of numerics
>
> Locate Position Of Each Record In each matrix using idx
>
> n=Length[idx]+1;
>
> ans01={};
>
> For[i=1,i<n,i++,
>
> step1 = Position[aa,idx[[i]]];
>
> AppendTo[ans01,step1]]
>
> n=Length[idx]+1;
>
> ans02={};
>
> For[i=1,i<n,i++,
>
> step1 = Position[bb,idx[[i]]];
>
> AppendTo[ans02,step1]]
>
> Extract a Records
>
> ans01 =Extract[a,ans01];
>
> ans02 =Extract[b,ans02];
>
>
>
> Sort each matrix by Column 1 using a defined function
>
> ans01=matsort[ans01,1]; ans02=matsort[ans02,1];
>
> In[69]:=
>
> final=RowJoin[ans02,ans01]
>
> Out[69]= =
> {{133,1,46,go,6,133,harry,45,fo},{165,88,45,mo,7,165,pete,44,fo},{
>
> 6589,77,69,xo,11,6589,ben,69,fo}}
>
> -----Original Message-----
>
> From: Tomas Garza [ <mailto:tgarza01 at prodigy.net.mx>
To: mathgroup at smc.vnet.net
> To: mathgroup at smc.vnet.net
> mailto:tgarza01 at prodigy.net.mx]
>
>
>
> Subject: [mg52514] [mg52478] [mg52453] Re: [mg52414] Speeding UP Indexing and 
> Joining
> ofDifferentSizeRectangular Matrixes
>
>
>
> Perhaps something could be done. Please explain your problem in more 
> detail.
>
>
> Never mind your program (what is loc01?). What do you mean by joining 
> rows?
>
> Give a small example with small matrices (say 5 x 2 or something like 
> that).
>
>
> What are your present run times?
>
> Tomas Garza
>
> Mexico City
>
> ----- Original Message -----
>
> From: "Benedetto Bongiorno" <bbongiorno at attglobal.net>
To: mathgroup at smc.vnet.net
> To: mathgroup at smc.vnet.net
>
>
> Subject: [mg52514] [mg52478] [mg52453] [mg52414] Speeding UP Indexing and Joining of
>
> DifferentSizeRectangular Matrixes
>
>
>
>>
>
>> Fellow MathGroup,
>
>>
>
>> I have been using Mathematica for financial analysis purposes and have
>
>> been developing note book programs for about 5 years.
>
>> My skills at this are self taught with help from Wolfram training and
>
>> support.
>
>> The largest challenge has been the speed in the analysis of large data
>
>> sets.
>
>> The following is an example of a routine that takes many hours.
>
>> PLEASE HELP AND SHOW HOW I CAN IMPROVE THE ROUTINE TO MAKE THE RUN
> TIME
>
>> SHORTER.
>
>>
>
>> Equipment HP XP 3.24 processor 2 Gigs
>
>> Mathematica 5.01
>
>> Data set a= 257470 by 40, Mixed numeric and string fields, but each
>
>> field
>
>> (column) is either or numeric or string
>
>> Data set b= 258705 by 5, All fields are numeric
>
>>
>
>> Objective: RowJoin the rows from each data set that have the same ID
>
>> field in their corresponding column one.
>
>>
>
>> Thank you and Happy Holidays
>
>>
>
>> ROUTINE
>
>> Create Index By Invoice ID
>
>>
>
>> firstCol=loc01[[1]];
>
>>
>
>> lastCol =loc01[[1]];
>
>>
>
>> aa = Transpose[Take[Transpose[a],{firstCol, lastCol}]];
>
>>
>
>> Length[aa]
>
>>
>
>> 257470
>
>>
>
>> firstCol=loc04[[1]];
>
>>
>
>> lastCol =loc04[[1]];
>
>>
>
>> bb = Transpose[Take[Transpose[b],{firstCol, lastCol}]];
>
>>
>
>> Length[bb]
>
>>
>
>> 258705
>
>>
>
>> idx=Intersection[aa,bb];
>
>>
>
>> Length[idx]
>
>>
>
>> 257249
>
>>
>
>> n=Length[idx]+1
>
>>
>
>> 257250
>
>>
>
>> Locate Position Of Each Record In aTable
>
>>
>
>> ans01={};
>
>>
>
>> For[i=1,i<n,i++,
>
>>
>
>> step1 = Position[aa,idx[[i]]];
>
>>
>
>> AppendTo[ans01,step1]]
>
>>
>
>> ans01=Flatten[ans01,1];
>
>>
>
>> Locate Position Of Each Record In bTable
>
>>
>
>> ans02={};
>
>>
>
>> For[i=1,i<n,i++,
>
>>
>
>> step1 = Position[bb,idx[[i]]];
>
>>
>
>> AppendTo[ans02,step1]]
>
>>
>
>> ans02=Flatten[ans02,1];
>
>>
>
>> Extract a Records by Index
>
>>
>
>> ans01 =Extract[currentBalance,ans01];
>
>>
>
>> Dimensions[ans01]
>
>>
>
>> Flatten If Not A Matrix
>
>>
>
>> If[MatrixQ[ans01],ans01=ans01,ans01=Flatten[ans01,1]];
>
>>
>
>> Dimensions[ans01]
>
>>
>
>> Extract b Records by Index
>
>>
>
>> ans02 =Extract[interestBalance,ans02];
>
>>
>
>> Dimensions[ans02]
>
>>
>
>> Flatten If Not A Matrix
>
>>
>
>> If[MatrixQ[ans02],ans02=ans02,ans02=Flatten[ans02,1]];
>
>>
>
>> Dimensions[ans02]
>
>>
>
>> ans01=matsort[ans01,loc01[[1]]];
>
>>
>
>> ans02=matsort[ans02,loc04[[1]]];
>
>>
>
>> noteTerms=RowJoin[ans02,ans01];
>
>>
>
>> Dimensions[noteTerms]
>
>>
>
>>
>
>>
>
>
>
>
>
>
> 



  • Prev by Date: Re: launching a kernel on a remote linux machine through ssh from a linux machine
  • Next by Date: Re: pair sums applied to trignometry sums
  • Previous by thread: Re;Re; Speeding Up Indexing and Joining
  • Next by thread: Help with dynamics