Re: Combining data from indexed lists efficiently

*To*: mathgroup at smc.vnet.net*Subject*: [mg106195] Re: Combining data from indexed lists efficiently*From*: dh <dh at metrohm.com>*Date*: Tue, 5 Jan 2010 01:42:12 -0500 (EST)*References*: <hhshpl$kqv$1@smc.vnet.net>

Hi Steve, one way to do it: 1-Join the lists 2-gather all entries with the same label 3-separate label and values The following code will do this: t = Join[list1, list2, list3]; GatherBy[t, First] /. x : {{a_, _} ..} :> {a, x[[All, 2]] } Daniel Steve W. Brewer wrote: > I have several lists of the format: > > { {index1, value}, {index2, value}, ... {indexN, value} } > > For example: > > list1 = { {"A", 1}, {"B", 2}, {"C", 3}, {"D", 4} } > list2 = { {"A", 5}, {"B", 6}, {"D", 7}, {"E", 8} } > list3 = { {"A", 9}, {"B", 10}, {"C", 11} } > > The indexes are not necessarily strings; they may be any expression. (In > the specific case I'm addressing now, each index is a list representing a > date/time in the format returned by DateList[].) The lists are not > necessarily the same length. Also, while most of the indexes appear in all > lists, there are some holes (missing data). > > I want to combine the lists into a single list of the format: > > { { index1, {value1, value2, ... valueN} }, > { index2, {value1, value2, ... valueN} }, > ... > { indexN, {value1, value2, ... valueN} } } > > Only the data points with indexes appearing in all lists should be included; > the rest should be dropped. Also, I want to include some derived values > along with the original data values. > > Using the sample data above, let's say I want to include two derived values > from the functions: > > f1[list1Data_, list2Data_] := list1Data + list2Data > f2[list2Data_, list3Data_] := list2Data + list3Data > > The result would be: > > combinedList = { { "A", {1, 5, 9, 6, 14} }, > { "B", {2, 6, 10, 8, 16} } } > > I have a solution that works fine on "small" data sets. However, it's > impractically slow on the "large" data sets I really need to run it on (over > 100k elements in each list). > > Here's what I'm doing now: > > > (* This part executes pretty quickly *) > > indexesToUse = > Intersection[First /@ list1, First /@ list2, First /@ list3]; > > valueAtIndex[index_, list_] := > Cases[list, {index, _}, 1, 1] // First // Last; > > dataAtIndex[index_] := Block[ > {v1, v2, v3, vf1, vf2}, > > v1 = valueAtIndex[index, list1]; > v2 = valueAtIndex[index, list2]; > v3 = valueAtIndex[index, list3]; > > vf1 = f1[v1, v2]; > vf2 = f2[v2, v3]; > > {v1, v2, v3, vf1, vf2} > ]; > > (* This is where it bogs down *) > > combinedList = > Function[{index}, {index, dataAtIndex[index]}] /@ indexesToUse; > > > This is all inside an enclosing Module[] along with some other code, and the > actual code is a little more complex (e.g. more than three lists, more than > two derived-value functions). The derived-value functions themselves are > mostly simple algebra; I doubt they're the source of the bottleneck, and in > any case, I can't change them. (I *can* change the way they're applied, > though, if it makes a difference.) > > I *think* the bottleneck is probably in my repeated calls to Cases[] to find > particular data points, but that's just a guess. > > Is there a more efficient way of doing this that would speed things up > significantly? > > Thanks! > > > Steve W. Brewer > >