Re: Database like "left join" or SAS like Merge function
- To: mathgroup at smc.vnet.net
- Subject: [mg83623] Re: Database like "left join" or SAS like Merge function
- From: Albert <awnl at arcor.net>
- Date: Sun, 25 Nov 2007 04:34:49 -0500 (EST)
- References: <fi6abc$oh3$1@smc.vnet.net>
Hi Peter, > Is there a builtin function that can "join" (merge) two two- > dimensional data lists based on a single or multiple key column? > > This seems like such a basic requirement that I feel like I must be > missing something obvious. I also don't know about a built in function which would allow to do this and there is nothing wrong with the solutions that have been sent already. Still I couldn't resist to post another method which takes advantage of using lookup tables, which is something which comes, via pattern matching, as a built in to mathematica but is often not recognized as that. Here is a simple function that builds a lookup table from a list with keys from the n'th column of the list. Note that you will need a symbol which holds the lookup table in it's downvalues, so this is the first argument here: makeLookupTable[symbol_,list_,column_]:=( symbol[___]=Table[0.,{Length[list[[1]]-1]}]; Scan[ With[{key=#[[column]]},symbol[key]=Delete[#,column]]&, list ] ); Note that it uses a list with zeros as "default" if no key is found. Of course you might want a different behavior concerning the defaults depending on your use case. now here are two lists: m1=Table[{i,Random[],Random[]},{i,10}]; m2=Table[{Random[],Random[],Random[],i},{i,2,12}]; here I create two lookup tables using columns 1 respectively 4 as keys: makeLookupTable[lt1,m1,1] makeLookupTable[lt2,m2,4] these are all keys that are in either m1 or m2: allkeys=Union[m1[[All,1]],m2[[All,4]]] build the merged lists using all these keys: Join[{#},lt1[#],lt2[#]]&/@allkeys Note how the defaults are used for keys not existing in one of the two lists. Of course the details may vary and there are many possible variations. In general I found this a very useful programming pattern which allows me to write code which at least I can read easily and in many cases is also quite efficient, since the whole work of searching for keys is up to the pattern matcher, which is implemented at a low level and needs to do these things efficient to be of any use. Note that the keys are not limited at all but you can use any mathematica expression as keys, and by mixing with patterns you can implement defaults for non existing keys in a very powerful way. An earlier examination lead me to the conclusion that using plain strings as keys was very fast, while more complicated pattern were not so efficient, but this might have changed since version 4.x ... When combining this to a function, you could use local variables to hold the lookup tables and which are cleaned up automatically: mergelists[lst1_List,lst2_List,n1_:1,n2_:1]:=Module[{ lt1,lt2,keys }, makeLookupTable[lt1,lst1,n1]; makeLookupTable[lt2,lst2,n2]; keys=Union[lst1[[All,n1]],lst2[[All,n2]]]; Join[{#},lt1[#],lt2[#]]&/@keys ]; hth, albert