Re: Combining data from indexed lists efficiently
- To: mathgroup at smc.vnet.net
- Subject: [mg106228] Re: Combining data from indexed lists efficiently
- From: Bill Rowe <readnews at sbcglobal.net>
- Date: Tue, 5 Jan 2010 01:48:52 -0500 (EST)
On 1/4/10 at 5:59 AM, steve at take5.org (Steve W. Brewer) wrote: >I have several lists of the format: >{ {index1, value}, {index2, value}, ... {indexN, value} } >For example: >list1 = { {"A", 1}, {"B", 2}, {"C", 3}, {"D", 4} } >list2 = { {"A", 5}, {"B", 6}, {"D", 7}, {"E", 8} } >list3 = { {"A", 9}, {"B", 10}, {"C", 11} } >The indexes are not necessarily strings; they may be any expression. >(In the specific case I'm addressing now, each index is a list >representing a date/time in the format returned by DateList[].) The >lists are not necessarily the same length. Also, while most of the >indexes appear in all lists, there are some holes (missing data). >I want to combine the lists into a single list of the format: >{ { index1, {value1, value2, ... valueN} }, >{ index2, {value1, value2, ... valueN} }, >... >{ indexN, {value1, value2, ... valueN} } } >Only the data points with indexes appearing in all lists should be >included; the rest should be dropped. Also, I want to include some >derived values along with the original data values. >Using the sample data above, let's say I want to include two derived >values from the functions: > >f1[list1Data_, list2Data_] := list1Data + list2Data f2[list2Data_, >list3Data_] := list2Data + list3Data > >The result would be: >combinedList = { { "A", {1, 5, 9, 6, 14} }, >{ "B", {2, 6, 10, 8, 16} } } >I have a solution that works fine on "small" data sets. However, >it's impractically slow on the "large" data sets I really need to >run it on (over 100k elements in each list). >Here's what I'm doing now: <details snipped> If you are using version 7, then what you want can be achieve with a lot less code by doing: In[7]:= list1 = {{"A", 1}, {"B", 2}, {"C", 3}, {"D", 4}}; list2 = {{"A", 5}, {"B", 6}, {"D", 7}, {"E", 8}}; list3 = {{"A", 9}, {"B", 10}, {"C", 11}}; In[10]:= {#[[1, 1]], #[[All, 2]]} & /@ GatherBy[Join[list1, list2, list3], First] Out[10]= {{"A", {1, 5, 9}}, {"B", {2, 6, 10}}, {"C", {3, 11}}, {"D", {4, 7}}, {"E", {8}}} I don't know how well this will scale to 100K + items per list. My guess is this will perform better than what you indicated you had tried.