Re: Database like "left join" or SAS like Merge function
- To: mathgroup at smc.vnet.net
- Subject: [mg83608] Re: Database like "left join" or SAS like Merge function
- From: Mark Fisher <particlefilter at gmail.com>
- Date: Sat, 24 Nov 2007 04:16:52 -0500 (EST)
- References: <fi6abc$oh3$1@smc.vnet.net>
On Nov 23, 5:34 am, Peter <pjcros... at gmail.com> wrote:
> Is there a builtin function that can "join" (merge) two two-
> dimensional data lists based on a single or multiple key column?
>
> This seems like such a basic requirement that I feel like I must be
> missing something obvious.
>
> Thanks, Peter
Hi Peter,
I'm not aware of any built-in function that does what you want. Here's
something I wrote for myself that may be related to what you're
looking for. The main function is MergeByKey. It calls a helper
function SubsetPosition.
--Mark
(* code starts here *)
MergeByKey::usage = "MergeByKey[{data1, data2, ...}, n] merges \
the datasets treating the first n columns of each dataset \
as the key. If n is omitted, the first column is used as the key. \
MergeByKey[datasets, list] uses the list of column numbers to \
specify the key columns which are placed first in the merged dataset.
\
Only rows that have matching keys in all datasets appear in the \
merged dataset. MergeByKey assumes there are no duplicate keys \
in each of the datasets."
SubsetPosition::usage = "SubsetPosition[set, subset] returns the \
positions of subset in the set, assuming both set and subset are \
sorted and have no duplicates."
MergeByKey[datalist:{__List}, n_:1] :=
Module[{i, keylist, ikeys, poslist, dlist},
i = Range[n];
keylist = #[[All, i]] & /@ datalist;
ikeys = Intersection @@ keylist;
poslist = SubsetPosition[#, ikeys]& /@ keylist;
dlist = MapThread[#1[[#2, n + 1 ;;]] &, {datalist, poslist}];
Join @@@ Transpose[Prepend[dlist, ikeys]]
]
MergeByKey[datalist:{__List}, index_List] :=
Module[{orderlist},
orderlist =
Flatten[Join[index, Complement[Range[Length[#[[1]]]], index]]]& /@
datalist;
MergeByKey[MapThread[#1[[All, #2]]&, {datalist, orderlist}],
Length[index]]
]
SubsetPosition[superset:{__Integer}, subset:{__Integer}] :=
sspCompiled[superset, subset]
SubsetPosition[superset_, subset_] :=
Module[{i = 1},
(While[superset[[i]] != #, i++]; i++) & /@ subset
]
sspCompiled = Compile[{
{superset, _Integer, 1},
{subset, _Integer, 1}},
Module[{i = 1},
(While[superset[[i]] != #, i++]; i++) & /@ subset
]]
(* code ends here *)