MathGroup Archive 2007

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Database like "left join" or SAS like Merge function

  • To: mathgroup at smc.vnet.net
  • Subject: [mg83608] Re: Database like "left join" or SAS like Merge function
  • From: Mark Fisher <particlefilter at gmail.com>
  • Date: Sat, 24 Nov 2007 04:16:52 -0500 (EST)
  • References: <fi6abc$oh3$1@smc.vnet.net>

On Nov 23, 5:34 am, Peter <pjcros... at gmail.com> wrote:
> Is there a builtin function that can "join" (merge) two two-
> dimensional data lists based on a single or multiple key column?
>
> This seems like such a basic requirement that I feel like I must be
> missing something obvious.
>
> Thanks, Peter

Hi Peter,

I'm not aware of any built-in function that does what you want. Here's
something I wrote for myself that may be related to what you're
looking for. The main function is MergeByKey. It calls a helper
function SubsetPosition.

--Mark

(* code starts here *)

MergeByKey::usage = "MergeByKey[{data1, data2, ...}, n] merges \
the datasets treating the first n columns of each dataset \
as the key. If n is omitted, the first column is used as the key. \
MergeByKey[datasets, list] uses the list of column numbers to \
specify the key columns which are placed first in the merged dataset.
\
Only rows that have matching keys in all datasets appear in the \
merged dataset. MergeByKey assumes there are no duplicate keys \
in each of the datasets."

SubsetPosition::usage = "SubsetPosition[set, subset] returns the \
positions of subset in the set, assuming both set and subset are \
sorted and have no duplicates."

MergeByKey[datalist:{__List}, n_:1] :=
  Module[{i, keylist, ikeys, poslist, dlist},
  i = Range[n];
  keylist = #[[All, i]] & /@ datalist;
  ikeys = Intersection @@ keylist;
  poslist = SubsetPosition[#, ikeys]& /@ keylist;
  dlist = MapThread[#1[[#2, n + 1 ;;]] &, {datalist, poslist}];
  Join @@@ Transpose[Prepend[dlist, ikeys]]
  ]

MergeByKey[datalist:{__List}, index_List] :=
  Module[{orderlist},
  orderlist =
    Flatten[Join[index, Complement[Range[Length[#[[1]]]], index]]]& /@
      datalist;
  MergeByKey[MapThread[#1[[All, #2]]&, {datalist, orderlist}],
Length[index]]
  ]

SubsetPosition[superset:{__Integer}, subset:{__Integer}] :=
sspCompiled[superset, subset]

SubsetPosition[superset_, subset_] :=
  Module[{i = 1},
  (While[superset[[i]] != #, i++]; i++) & /@ subset
  ]

sspCompiled = Compile[{
    {superset, _Integer, 1},
    {subset, _Integer, 1}},
  Module[{i = 1},
  (While[superset[[i]] != #, i++]; i++) & /@ subset
  ]]

(* code ends here *)


  • Prev by Date: Mathematica SIG (Northern Virginia and Washington DC)
  • Next by Date: Button[label, action] - InputString[]?
  • Previous by thread: Re: Database like "left join" or SAS like Merge function
  • Next by thread: Re: Database like "left join" or SAS like Merge function