Re: Database like "left join" or SAS like Merge function
- To: mathgroup at smc.vnet.net
- Subject: [mg83608] Re: Database like "left join" or SAS like Merge function
- From: Mark Fisher <particlefilter at gmail.com>
- Date: Sat, 24 Nov 2007 04:16:52 -0500 (EST)
- References: <fi6abc$oh3$1@smc.vnet.net>
On Nov 23, 5:34 am, Peter <pjcros... at gmail.com> wrote: > Is there a builtin function that can "join" (merge) two two- > dimensional data lists based on a single or multiple key column? > > This seems like such a basic requirement that I feel like I must be > missing something obvious. > > Thanks, Peter Hi Peter, I'm not aware of any built-in function that does what you want. Here's something I wrote for myself that may be related to what you're looking for. The main function is MergeByKey. It calls a helper function SubsetPosition. --Mark (* code starts here *) MergeByKey::usage = "MergeByKey[{data1, data2, ...}, n] merges \ the datasets treating the first n columns of each dataset \ as the key. If n is omitted, the first column is used as the key. \ MergeByKey[datasets, list] uses the list of column numbers to \ specify the key columns which are placed first in the merged dataset. \ Only rows that have matching keys in all datasets appear in the \ merged dataset. MergeByKey assumes there are no duplicate keys \ in each of the datasets." SubsetPosition::usage = "SubsetPosition[set, subset] returns the \ positions of subset in the set, assuming both set and subset are \ sorted and have no duplicates." MergeByKey[datalist:{__List}, n_:1] := Module[{i, keylist, ikeys, poslist, dlist}, i = Range[n]; keylist = #[[All, i]] & /@ datalist; ikeys = Intersection @@ keylist; poslist = SubsetPosition[#, ikeys]& /@ keylist; dlist = MapThread[#1[[#2, n + 1 ;;]] &, {datalist, poslist}]; Join @@@ Transpose[Prepend[dlist, ikeys]] ] MergeByKey[datalist:{__List}, index_List] := Module[{orderlist}, orderlist = Flatten[Join[index, Complement[Range[Length[#[[1]]]], index]]]& /@ datalist; MergeByKey[MapThread[#1[[All, #2]]&, {datalist, orderlist}], Length[index]] ] SubsetPosition[superset:{__Integer}, subset:{__Integer}] := sspCompiled[superset, subset] SubsetPosition[superset_, subset_] := Module[{i = 1}, (While[superset[[i]] != #, i++]; i++) & /@ subset ] sspCompiled = Compile[{ {superset, _Integer, 1}, {subset, _Integer, 1}}, Module[{i = 1}, (While[superset[[i]] != #, i++]; i++) & /@ subset ]] (* code ends here *)