MathGroup Archive 2007

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Database like "left join" or SAS like Merge function

  • To: mathgroup at smc.vnet.net
  • Subject: [mg83623] Re: Database like "left join" or SAS like Merge function
  • From: Albert <awnl at arcor.net>
  • Date: Sun, 25 Nov 2007 04:34:49 -0500 (EST)
  • References: <fi6abc$oh3$1@smc.vnet.net>

Hi Peter,

> Is there a builtin function that can "join" (merge) two two-
> dimensional data lists based on a single or multiple key column?
> 
> This seems like such a basic requirement that I feel like I must be
> missing something obvious.

I also don't know about a built in function which would allow to do this 
and there is nothing wrong with the solutions that have been sent 
already. Still I couldn't resist to post another method which takes 
advantage of using lookup tables, which is something which comes, via 
pattern matching, as a built in to mathematica but is often not 
recognized as that. Here is a simple function that builds a lookup table 
from a list with keys from the n'th column of the list. Note that you 
will need a symbol which holds the lookup table in it's downvalues, so 
this is the first argument here:

makeLookupTable[symbol_,list_,column_]:=(
symbol[___]=Table[0.,{Length[list[[1]]-1]}];
Scan[
With[{key=#[[column]]},symbol[key]=Delete[#,column]]&,
list
]
);

Note that it uses a list with zeros as "default" if no key is found. Of 
course you might want a different behavior concerning the defaults 
depending on your use case.

now here are two lists:

m1=Table[{i,Random[],Random[]},{i,10}];
m2=Table[{Random[],Random[],Random[],i},{i,2,12}];

here I create two lookup tables using columns 1 respectively 4 as keys:

makeLookupTable[lt1,m1,1]
makeLookupTable[lt2,m2,4]

these are all keys that are in either m1 or m2:

allkeys=Union[m1[[All,1]],m2[[All,4]]]

build the merged lists using all these keys:

Join[{#},lt1[#],lt2[#]]&/@allkeys

Note how the defaults are used for keys not existing in one of the two 
lists.

Of course the details may vary and there are many possible variations. 
In general I found this a very useful programming pattern which allows 
me to write code which at least I can read easily and in many cases is 
also quite efficient, since the whole work of searching for keys is up 
to the pattern matcher, which is implemented at a low level and needs to 
do these things efficient to be of any use. Note that the keys are not 
limited at all but you can use any mathematica expression as keys, and 
by mixing with patterns you can implement defaults for non existing keys 
in a very powerful way. An earlier examination lead me to the conclusion 
that using plain strings as keys was very fast, while more complicated 
pattern were not so efficient, but this might have changed since version 
4.x ...

When combining this to a function, you could use local variables to hold 
the lookup tables and which are cleaned up automatically:

mergelists[lst1_List,lst2_List,n1_:1,n2_:1]:=Module[{
     lt1,lt2,keys
   },
   makeLookupTable[lt1,lst1,n1];
   makeLookupTable[lt2,lst2,n2];
   keys=Union[lst1[[All,n1]],lst2[[All,n2]]];
   Join[{#},lt1[#],lt2[#]]&/@keys
];

hth,

albert


  • Prev by Date: Re: Locator 3D
  • Next by Date: Re: Button[label, action] - InputString[]?
  • Previous by thread: Re: Database like "left join" or SAS like Merge function
  • Next by thread: Re: Mathematica v5 versus v6