Re: Re;Re; Speeding Up Indexing andJoining

*To*: mathgroup at smc.vnet.net*Subject*: [mg52514] Re: [mg52478] Re;[mg52453] Re; [mg52414]Speeding Up Indexing andJoining*From*: Tomas Garza <tgarza01 at prodigy.net.mx>*Date*: Tue, 30 Nov 2004 05:24:33 -0500 (EST)*References*: <200411280607.BAA06702@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

I think you can get a very substantial reduction in your run-time, down from "many, many hours" to less than a minute, perhaps. I borrowed the main idea from an example in the Help browser under Split (see the function defined therein, called "Classify"). This function, which I use in a very restricted sense, takes two lists and constructs disjoint sets of members of both lists which share some condition. In your case, it takes the joint list Join[a, b] and selects those members which have the same value in the first element. It goes on building these sets. In some cases, there will only be one element (where there is no element in b with the same first element as in a); in other cases, there will be two (when it finds an element in b which has the same value in the first position as an element in a). All you have to do then is to Select those cases with two elements, and you obtain your desired objective. The function Classify is defined as In[1]:= Classify[s_,f_:Identity]:= Map[Last,Split[Sort[{f[#],#}&/@s],SameQ[First[#1],First[#2]]&],{2}] Try to understand what's going on - it is really worth your while. Then, applying it to your example, In[2]:= a = {{123, "bob", 46, "fo"}, {133, "harry", 45, "fo"}, {165, "pete", 44, "fo"}, {2569, "moe", 56, "fo"}, {6589, "ben", 69, "fo"}}; b = {{133, 1, 46, "go", 6}, {165, 88, 45, "mo", 7}, {166, 53, 44, "do", 9}, {25, 82, 56, "ho", 9}, {6589, 77, 69, "xo", 11}, {6570, 77, 69, "xo", 11}, {6571, 77, 69, "xo", 11}, {6572, 77, 69, "x0", 11}}; In[4]:= pp = Join[b, a]; In[5]:= gg[s_] := s[[{1}]] t = Classify[pp, gg] Out[6]= {{{25, 82, 56, ho, 9}}, {{123, bob, 46, fo}}, {{133, harry, 45, fo}, {133, 1, 46, go, 6}}, {{165, pete, 44, fo}, {165, 88, 45, mo, 7}}, {{166, 53, 44, do, 9}}, {{2569, moe, 56, fo}}, {{6570, 77, 69, xo, 11}}, {{6571, 77, 69, xo, 11}}, {{6572, 77, 69, x0, 11}}, {{6589, ben, 69, fo}, {6589, 77, 69, xo, 11}}} Then Select those lists which have two elements, and join the two elements: In[7]:= Join[#[[2]],#[[1]]]&/@(Select[t,Length[#]>1&]) Out[7]= {{133,1,46,go,6,133,harry,45,fo},{165,88,45,mo,7,165,pete,44,fo},{6589,77,69, xo,11,6589,ben,69,fo}} which is the result you were after. I have applied this to a problem where a has over 280,000 records, each with some 40 elements, and b has around 40,000 records, each with 5 or 6 elements. I obtained the result in about 20 seconds. Notice that I avoid the use of the term "matrices"; it is quite unnecesary, since your lists inside a and b can have differents lengths, and their elements may not be numbers. There is nothing in the problem that has anything to do with concepts in matrix theory. We use mere lists... This is a beautiful example of the use of functional programming. It also illustrates the benefits of using the Help browser and studying the examples therein. Tomas Garza Mexico City ----- Original Message ----- From: "Benedetto Bongiorno" <bbongiorno at attglobal.net> To: mathgroup at smc.vnet.net Subject: [mg52514] [mg52478] Re;[mg52453] Re; [mg52414]Speeding Up Indexing and Joining > Fellow MathGroup, > > Here is a simple example. > > Again on large mattices, it takes many many hours > > Thank you > > > > In[31]:= > > a={{123,"bob",46,"fo"},{133,"harry",45,"fo"},{ > > 165,"pete",44,"fo"},{2569,"moe",56,"fo"},{6589,"ben",69,"fo"}}; > > In[32]:= > > b={{133,1,46,"go",6},{165,88,45,"mo",7},{ > > 166,53,44,"do",9},{25,82,56,"ho",9},{6589,77,69,"xo",11},{6570,77,69," > > xo",11},{6571,77,69,"xo",11},{6572,77,69,"xo",11}}; > > ROUTINE > > Create Index from Intersection of the first columns of matrix a and = > matrix b > > firstCol=1; > > lastCol =1; > > aa = Transpose[Take[Transpose[a],{firstCol, lastCol}]]; > > bb = Transpose[Take[Transpose[b],{firstCol, lastCol}]]; > idx=Intersection[aa,bb]; > > In[40]:= > > idx=Intersection[aa,bb] > > Out[40]= > > {{133},{165},{6589}} > > Select Records (Rows) from both matrix a and matrix b that equal the = > idx. > The idx consists of numerics > > Locate Position Of Each Record In each matrix using idx > > n=Length[idx]+1; > > ans01={}; > > For[i=1,i<n,i++, > > step1 = Position[aa,idx[[i]]]; > > AppendTo[ans01,step1]] > > n=Length[idx]+1; > > ans02={}; > > For[i=1,i<n,i++, > > step1 = Position[bb,idx[[i]]]; > > AppendTo[ans02,step1]] > > Extract a Records > > ans01 =Extract[a,ans01]; > > ans02 =Extract[b,ans02]; > > > > Sort each matrix by Column 1 using a defined function > > ans01=matsort[ans01,1]; ans02=matsort[ans02,1]; > > In[69]:= > > final=RowJoin[ans02,ans01] > > Out[69]= = > {{133,1,46,go,6,133,harry,45,fo},{165,88,45,mo,7,165,pete,44,fo},{ > > 6589,77,69,xo,11,6589,ben,69,fo}} > > -----Original Message----- > > From: Tomas Garza [ <mailto:tgarza01 at prodigy.net.mx> To: mathgroup at smc.vnet.net > To: mathgroup at smc.vnet.net > mailto:tgarza01 at prodigy.net.mx] > > > > Subject: [mg52514] [mg52478] [mg52453] Re: [mg52414] Speeding UP Indexing and > Joining > ofDifferentSizeRectangular Matrixes > > > > Perhaps something could be done. Please explain your problem in more > detail. > > > Never mind your program (what is loc01?). What do you mean by joining > rows? > > Give a small example with small matrices (say 5 x 2 or something like > that). > > > What are your present run times? > > Tomas Garza > > Mexico City > > ----- Original Message ----- > > From: "Benedetto Bongiorno" <bbongiorno at attglobal.net> To: mathgroup at smc.vnet.net > To: mathgroup at smc.vnet.net > > > Subject: [mg52514] [mg52478] [mg52453] [mg52414] Speeding UP Indexing and Joining of > > DifferentSizeRectangular Matrixes > > > >> > >> Fellow MathGroup, > >> > >> I have been using Mathematica for financial analysis purposes and have > >> been developing note book programs for about 5 years. > >> My skills at this are self taught with help from Wolfram training and > >> support. > >> The largest challenge has been the speed in the analysis of large data > >> sets. > >> The following is an example of a routine that takes many hours. > >> PLEASE HELP AND SHOW HOW I CAN IMPROVE THE ROUTINE TO MAKE THE RUN > TIME > >> SHORTER. > >> > >> Equipment HP XP 3.24 processor 2 Gigs > >> Mathematica 5.01 > >> Data set a= 257470 by 40, Mixed numeric and string fields, but each > >> field > >> (column) is either or numeric or string > >> Data set b= 258705 by 5, All fields are numeric > >> > >> Objective: RowJoin the rows from each data set that have the same ID > >> field in their corresponding column one. > >> > >> Thank you and Happy Holidays > >> > >> ROUTINE > >> Create Index By Invoice ID > >> > >> firstCol=loc01[[1]]; > >> > >> lastCol =loc01[[1]]; > >> > >> aa = Transpose[Take[Transpose[a],{firstCol, lastCol}]]; > >> > >> Length[aa] > >> > >> 257470 > >> > >> firstCol=loc04[[1]]; > >> > >> lastCol =loc04[[1]]; > >> > >> bb = Transpose[Take[Transpose[b],{firstCol, lastCol}]]; > >> > >> Length[bb] > >> > >> 258705 > >> > >> idx=Intersection[aa,bb]; > >> > >> Length[idx] > >> > >> 257249 > >> > >> n=Length[idx]+1 > >> > >> 257250 > >> > >> Locate Position Of Each Record In aTable > >> > >> ans01={}; > >> > >> For[i=1,i<n,i++, > >> > >> step1 = Position[aa,idx[[i]]]; > >> > >> AppendTo[ans01,step1]] > >> > >> ans01=Flatten[ans01,1]; > >> > >> Locate Position Of Each Record In bTable > >> > >> ans02={}; > >> > >> For[i=1,i<n,i++, > >> > >> step1 = Position[bb,idx[[i]]]; > >> > >> AppendTo[ans02,step1]] > >> > >> ans02=Flatten[ans02,1]; > >> > >> Extract a Records by Index > >> > >> ans01 =Extract[currentBalance,ans01]; > >> > >> Dimensions[ans01] > >> > >> Flatten If Not A Matrix > >> > >> If[MatrixQ[ans01],ans01=ans01,ans01=Flatten[ans01,1]]; > >> > >> Dimensions[ans01] > >> > >> Extract b Records by Index > >> > >> ans02 =Extract[interestBalance,ans02]; > >> > >> Dimensions[ans02] > >> > >> Flatten If Not A Matrix > >> > >> If[MatrixQ[ans02],ans02=ans02,ans02=Flatten[ans02,1]]; > >> > >> Dimensions[ans02] > >> > >> ans01=matsort[ans01,loc01[[1]]]; > >> > >> ans02=matsort[ans02,loc04[[1]]]; > >> > >> noteTerms=RowJoin[ans02,ans01]; > >> > >> Dimensions[noteTerms] > >> > >> > >> > > > > > > >

**References**:**Re;Re; Speeding Up Indexing and Joining***From:*"Benedetto Bongiorno" <bbongiorno@attglobal.net>

**Re: launching a kernel on a remote linux machine through ssh from a linux machine**

**Re: pair sums applied to trignometry sums**

**Re;Re; Speeding Up Indexing and Joining**

**Help with dynamics**