MathGroup Archive: November 2004 [00667]

[Date Index] [Thread Index] [Author Index]

Re: Speeding UP Indexing and Joining of Different Size Rectangular Matrixes

To: mathgroup at smc.vnet.net
Subject: [mg52424] Re: [mg52414] Speeding UP Indexing and Joining of Different Size Rectangular Matrixes
From: DrBob <drbob at bigfoot.com>
Date: Fri, 26 Nov 2004 06:20:31 -0500 (EST)
References: <200411260604.BAA24195@smc.vnet.net>
Reply-to: drbob at bigfoot.com
Sender: owner-wri-mathgroup at wolfram.com

Here's an approach you might try:

Define lookup functions for a and b. When this is done for each dataset, the original dataset is no longer needed. It might actually be more efficient to do this while reading in the data (assuming it came from a file), so that the data is never stored in a matrix at all. (I'm assuming the index is in column 1.)

Clear@aFind
aFind[_] := {}
aNDX = (aFind[First@#] = Rest@#; First@#) & /@ a
Clear@a

Clear@bFind
bFind[_] := {}
bNDX = (bFind[First@#] = Rest@#; First@#) & /@ b
Clear@b

Now we can loop through indices in aNDX or bNDX to reach all the data, or we can loop through

abNDX = Intersection[aNDX, bNDX]

to see only the records in both datasets.

The value of extract will be the concatenation of both dataset rows. It will look like {{data from table a},{data from table b}}, {{},{data from table b}} or {{data from table a}, {}}, depending on whether x is in both datasets, dataset b only, or a only.

extract[x_] := {aFind[x], bFind[x]}

This replaces your For loop, AppendTo, and Extract. It takes zero time to define this.

The nth column of a record in a or b can be accessed using Part in a couple of different ways:

aFind[x_][[n]] == extract[x][[1,n]]
bFind[x_][[n]] == extract[x][[2,n]]

The first will fail (in either form) if x isn't an index in dataset a, and the second will fail (in either form) if it isn't in b.

I have no idea what Flatten is doing for you, what the variables interestBalance and currentBalance are, or what you intend matsort to do.

Bobby

On Fri, 26 Nov 2004 01:04:53 -0500 (EST), Benedetto Bongiorno <bbongiorno at attglobal.net> wrote:

>
> Fellow MathGroup,
>
> I have been using Mathematica for financial analysis purposes and have been
> developing note book programs for about 5 years.
> My skills at this are self taught with help from Wolfram training and support.
> The largest challenge has been the speed in the analysis of large data sets.
> The following is an example of a routine that takes many hours.
> PLEASE HELP AND SHOW HOW I CAN IMPROVE THE ROUTINE TO MAKE THE RUN TIME
> SHORTER.
>
> Equipment HP XP 3.24 processor 2 Gigs
> Mathematica 5.01
> Data set a= 257470 by 40, Mixed numeric and string fields, but each field
> (column) is either or numeric or string
> Data set b= 258705 by 5, All fields are numeric
>
> Objective:  RowJoin the rows from each data set that have the same ID field
> in their corresponding column one.
>
> Thank you and Happy Holidays
>
> ROUTINE
> Create Index By Invoice ID
>
> firstCol=loc01[[1]];
>
> lastCol =loc01[[1]];
>
> aa = Transpose[Take[Transpose[a],{firstCol, lastCol}]];
>
> Length[aa]
>
> 257470
>
> firstCol=loc04[[1]];
>
> lastCol =loc04[[1]];
>
> bb = Transpose[Take[Transpose[b],{firstCol, lastCol}]];
>
> Length[bb]
>
> 258705
>
> idx=Intersection[aa,bb];
>
> Length[idx]
>
> 257249
>
> n=Length[idx]+1
>
> 257250
>
> Locate Position Of Each Record In aTable
>
> ans01={};
>
> For[i=1,i<n,i++,
>
> step1 = Position[aa,idx[[i]]];
>
> AppendTo[ans01,step1]]
>
> ans01=Flatten[ans01,1];
>
> Locate Position Of Each Record In bTable
>
> ans02={};
>
> For[i=1,i<n,i++,
>
> step1 = Position[bb,idx[[i]]];
>
> AppendTo[ans02,step1]]
>
> ans02=Flatten[ans02,1];
>
> Extract a Records by Index
>
> ans01 =Extract[currentBalance,ans01];
>
> Dimensions[ans01]
>
> Flatten If Not A Matrix
>
> If[MatrixQ[ans01],ans01=ans01,ans01=Flatten[ans01,1]];
>
> Dimensions[ans01]
>
> Extract b Records by Index
>
> ans02 =Extract[interestBalance,ans02];
>
> Dimensions[ans02]
>
> Flatten If Not A Matrix
>
> If[MatrixQ[ans02],ans02=ans02,ans02=Flatten[ans02,1]];
>
> Dimensions[ans02]
>
> ans01=matsort[ans01,loc01[[1]]];
>
> ans02=matsort[ans02,loc04[[1]]];
>
> noteTerms=RowJoin[ans02,ans01];
>
> Dimensions[noteTerms]
>
>
>
>
>



-- 
DrBob at bigfoot.com
www.eclecticdreams.net

References:
- Speeding UP Indexing and Joining of Different Size Rectangular Matrixes
  - From: "Benedetto Bongiorno" <bbongiorno@attglobal.net>

Prev by Date: Re: Using the same symbol for different quantities

Next by Date: Re: Non-algebraic solution

Previous by thread: Speeding UP Indexing and Joining of Different Size Rectangular Matrixes

Next by thread: Re: Speeding UP Indexing and Joining of Different Size Rectangular Matrixes