Combining data from indexed lists efficiently
- To: mathgroup at smc.vnet.net
- Subject: [mg106182] Combining data from indexed lists efficiently
- From: "Steve W. Brewer" <steve at take5.org>
- Date: Mon, 4 Jan 2010 05:59:52 -0500 (EST)
I have several lists of the format: { {index1, value}, {index2, value}, ... {indexN, value} } For example: list1 = { {"A", 1}, {"B", 2}, {"C", 3}, {"D", 4} } list2 = { {"A", 5}, {"B", 6}, {"D", 7}, {"E", 8} } list3 = { {"A", 9}, {"B", 10}, {"C", 11} } The indexes are not necessarily strings; they may be any expression. (In the specific case I'm addressing now, each index is a list representing a date/time in the format returned by DateList[].) The lists are not necessarily the same length. Also, while most of the indexes appear in all lists, there are some holes (missing data). I want to combine the lists into a single list of the format: { { index1, {value1, value2, ... valueN} }, { index2, {value1, value2, ... valueN} }, ... { indexN, {value1, value2, ... valueN} } } Only the data points with indexes appearing in all lists should be included; the rest should be dropped. Also, I want to include some derived values along with the original data values. Using the sample data above, let's say I want to include two derived values from the functions: f1[list1Data_, list2Data_] := list1Data + list2Data f2[list2Data_, list3Data_] := list2Data + list3Data The result would be: combinedList = { { "A", {1, 5, 9, 6, 14} }, { "B", {2, 6, 10, 8, 16} } } I have a solution that works fine on "small" data sets. However, it's impractically slow on the "large" data sets I really need to run it on (over 100k elements in each list). Here's what I'm doing now: (* This part executes pretty quickly *) indexesToUse = Intersection[First /@ list1, First /@ list2, First /@ list3]; valueAtIndex[index_, list_] := Cases[list, {index, _}, 1, 1] // First // Last; dataAtIndex[index_] := Block[ {v1, v2, v3, vf1, vf2}, v1 = valueAtIndex[index, list1]; v2 = valueAtIndex[index, list2]; v3 = valueAtIndex[index, list3]; vf1 = f1[v1, v2]; vf2 = f2[v2, v3]; {v1, v2, v3, vf1, vf2} ]; (* This is where it bogs down *) combinedList = Function[{index}, {index, dataAtIndex[index]}] /@ indexesToUse; This is all inside an enclosing Module[] along with some other code, and the actual code is a little more complex (e.g. more than three lists, more than two derived-value functions). The derived-value functions themselves are mostly simple algebra; I doubt they're the source of the bottleneck, and in any case, I can't change them. (I *can* change the way they're applied, though, if it makes a difference.) I *think* the bottleneck is probably in my repeated calls to Cases[] to find particular data points, but that's just a guess. Is there a more efficient way of doing this that would speed things up significantly? Thanks! Steve W. Brewer
- Follow-Ups:
- Re: Combining data from indexed lists efficiently
- From: Leonid Shifrin <lshifr@gmail.com>
- Re: Combining data from indexed lists efficiently
- From: DrMajorBob <btreat1@austin.rr.com>
- Re: Combining data from indexed lists efficiently
- From: Daniel Lichtblau <danl@wolfram.com>
- Re: Combining data from indexed lists efficiently