Re: Database like "left join" or SAS like Merge function
- To: mathgroup at smc.vnet.net
- Subject: [mg83623] Re: Database like "left join" or SAS like Merge function
- From: Albert <awnl at arcor.net>
- Date: Sun, 25 Nov 2007 04:34:49 -0500 (EST)
- References: <fi6abc$oh3$1@smc.vnet.net>
Hi Peter,
> Is there a builtin function that can "join" (merge) two two-
> dimensional data lists based on a single or multiple key column?
>
> This seems like such a basic requirement that I feel like I must be
> missing something obvious.
I also don't know about a built in function which would allow to do this
and there is nothing wrong with the solutions that have been sent
already. Still I couldn't resist to post another method which takes
advantage of using lookup tables, which is something which comes, via
pattern matching, as a built in to mathematica but is often not
recognized as that. Here is a simple function that builds a lookup table
from a list with keys from the n'th column of the list. Note that you
will need a symbol which holds the lookup table in it's downvalues, so
this is the first argument here:
makeLookupTable[symbol_,list_,column_]:=(
symbol[___]=Table[0.,{Length[list[[1]]-1]}];
Scan[
With[{key=#[[column]]},symbol[key]=Delete[#,column]]&,
list
]
);
Note that it uses a list with zeros as "default" if no key is found. Of
course you might want a different behavior concerning the defaults
depending on your use case.
now here are two lists:
m1=Table[{i,Random[],Random[]},{i,10}];
m2=Table[{Random[],Random[],Random[],i},{i,2,12}];
here I create two lookup tables using columns 1 respectively 4 as keys:
makeLookupTable[lt1,m1,1]
makeLookupTable[lt2,m2,4]
these are all keys that are in either m1 or m2:
allkeys=Union[m1[[All,1]],m2[[All,4]]]
build the merged lists using all these keys:
Join[{#},lt1[#],lt2[#]]&/@allkeys
Note how the defaults are used for keys not existing in one of the two
lists.
Of course the details may vary and there are many possible variations.
In general I found this a very useful programming pattern which allows
me to write code which at least I can read easily and in many cases is
also quite efficient, since the whole work of searching for keys is up
to the pattern matcher, which is implemented at a low level and needs to
do these things efficient to be of any use. Note that the keys are not
limited at all but you can use any mathematica expression as keys, and
by mixing with patterns you can implement defaults for non existing keys
in a very powerful way. An earlier examination lead me to the conclusion
that using plain strings as keys was very fast, while more complicated
pattern were not so efficient, but this might have changed since version
4.x ...
When combining this to a function, you could use local variables to hold
the lookup tables and which are cleaned up automatically:
mergelists[lst1_List,lst2_List,n1_:1,n2_:1]:=Module[{
lt1,lt2,keys
},
makeLookupTable[lt1,lst1,n1];
makeLookupTable[lt2,lst2,n2];
keys=Union[lst1[[All,n1]],lst2[[All,n2]]];
Join[{#},lt1[#],lt2[#]]&/@keys
];
hth,
albert