Re: Combining data from indexed lists efficiently
- To: mathgroup at smc.vnet.net
- Subject: [mg106195] Re: Combining data from indexed lists efficiently
- From: dh <dh at metrohm.com>
- Date: Tue, 5 Jan 2010 01:42:12 -0500 (EST)
- References: <hhshpl$kqv$1@smc.vnet.net>
Hi Steve,
one way to do it:
1-Join the lists
2-gather all entries with the same label
3-separate label and values
The following code will do this:
t = Join[list1, list2, list3];
GatherBy[t, First] /. x : {{a_, _} ..} :> {a, x[[All, 2]] }
Daniel
Steve W. Brewer wrote:
> I have several lists of the format:
>
> { {index1, value}, {index2, value}, ... {indexN, value} }
>
> For example:
>
> list1 = { {"A", 1}, {"B", 2}, {"C", 3}, {"D", 4} }
> list2 = { {"A", 5}, {"B", 6}, {"D", 7}, {"E", 8} }
> list3 = { {"A", 9}, {"B", 10}, {"C", 11} }
>
> The indexes are not necessarily strings; they may be any expression. (In
> the specific case I'm addressing now, each index is a list representing a
> date/time in the format returned by DateList[].) The lists are not
> necessarily the same length. Also, while most of the indexes appear in all
> lists, there are some holes (missing data).
>
> I want to combine the lists into a single list of the format:
>
> { { index1, {value1, value2, ... valueN} },
> { index2, {value1, value2, ... valueN} },
> ...
> { indexN, {value1, value2, ... valueN} } }
>
> Only the data points with indexes appearing in all lists should be included;
> the rest should be dropped. Also, I want to include some derived values
> along with the original data values.
>
> Using the sample data above, let's say I want to include two derived values
> from the functions:
>
> f1[list1Data_, list2Data_] := list1Data + list2Data
> f2[list2Data_, list3Data_] := list2Data + list3Data
>
> The result would be:
>
> combinedList = { { "A", {1, 5, 9, 6, 14} },
> { "B", {2, 6, 10, 8, 16} } }
>
> I have a solution that works fine on "small" data sets. However, it's
> impractically slow on the "large" data sets I really need to run it on (over
> 100k elements in each list).
>
> Here's what I'm doing now:
>
>
> (* This part executes pretty quickly *)
>
> indexesToUse =
> Intersection[First /@ list1, First /@ list2, First /@ list3];
>
> valueAtIndex[index_, list_] :=
> Cases[list, {index, _}, 1, 1] // First // Last;
>
> dataAtIndex[index_] := Block[
> {v1, v2, v3, vf1, vf2},
>
> v1 = valueAtIndex[index, list1];
> v2 = valueAtIndex[index, list2];
> v3 = valueAtIndex[index, list3];
>
> vf1 = f1[v1, v2];
> vf2 = f2[v2, v3];
>
> {v1, v2, v3, vf1, vf2}
> ];
>
> (* This is where it bogs down *)
>
> combinedList =
> Function[{index}, {index, dataAtIndex[index]}] /@ indexesToUse;
>
>
> This is all inside an enclosing Module[] along with some other code, and the
> actual code is a little more complex (e.g. more than three lists, more than
> two derived-value functions). The derived-value functions themselves are
> mostly simple algebra; I doubt they're the source of the bottleneck, and in
> any case, I can't change them. (I *can* change the way they're applied,
> though, if it makes a difference.)
>
> I *think* the bottleneck is probably in my repeated calls to Cases[] to find
> particular data points, but that's just a guess.
>
> Is there a more efficient way of doing this that would speed things up
> significantly?
>
> Thanks!
>
>
> Steve W. Brewer
>
>