Re: Combining data from indexed lists efficiently

• To: mathgroup at smc.vnet.net
• Subject: [mg106195] Re: Combining data from indexed lists efficiently
• From: dh <dh at metrohm.com>
• Date: Tue, 5 Jan 2010 01:42:12 -0500 (EST)
• References: <hhshpl\$kqv\$1@smc.vnet.net>

```
Hi Steve,

one way to do it:

1-Join the lists

2-gather all entries with the same label

3-separate label and values

The following code will do this:

t = Join[list1, list2, list3];

GatherBy[t, First] /. x : {{a_, _} ..} :> {a, x[[All, 2]] }

Daniel

Steve W. Brewer wrote:

> I have several lists of the format:

>

>     { {index1, value}, {index2, value}, ... {indexN, value} }

>

> For example:

>

>     list1 = { {"A", 1}, {"B",  2}, {"C",  3}, {"D", 4} }

>     list2 = { {"A", 5}, {"B",  6}, {"D",  7}, {"E", 8} }

>     list3 = { {"A", 9}, {"B", 10}, {"C", 11} }

>

> The indexes are not necessarily strings; they may be any expression.  (In

> the specific case I'm addressing now, each index is a list representing a

> date/time in the format returned by DateList[].)  The lists are not

> necessarily the same length.  Also, while most of the indexes appear in all

> lists, there are some holes (missing data).

>

> I want to combine the lists into a single list of the format:

>

>     { { index1, {value1, value2, ... valueN} },

>       { index2, {value1, value2, ... valueN} },

>       ...

>       { indexN, {value1, value2, ... valueN} } }

>

> Only the data points with indexes appearing in all lists should be included;

> the rest should be dropped.  Also, I want to include some derived values

> along with the original data values.

>

> Using the sample data above, let's say I want to include two derived values

> from the functions:

>

>     f1[list1Data_, list2Data_] := list1Data + list2Data

>     f2[list2Data_, list3Data_] := list2Data + list3Data

>

> The result would be:

>

>     combinedList = { { "A", {1, 5,  9, 6, 14} },

>                      { "B", {2, 6, 10, 8, 16} } }

>

> I have a solution that works fine on "small" data sets. However, it's

> impractically slow on the "large" data sets I really need to run it on (over

> 100k elements in each list).

>

> Here's what I'm doing now:

>

>

>     (* This part executes pretty quickly *)

>

>     indexesToUse =

>         Intersection[First /@ list1, First /@ list2, First /@ list3];

>

>     valueAtIndex[index_, list_] :=

>         Cases[list, {index, _}, 1, 1] // First // Last;

>

>     dataAtIndex[index_] := Block[

>         {v1, v2, v3, vf1, vf2},

>

>         v1 = valueAtIndex[index, list1];

>         v2 = valueAtIndex[index, list2];

>         v3 = valueAtIndex[index, list3];

>

>         vf1 = f1[v1, v2];

>         vf2 = f2[v2, v3];

>

>         {v1, v2, v3, vf1, vf2}

>     ];

>

>     (* This is where it bogs down *)

>

>     combinedList =

>         Function[{index}, {index, dataAtIndex[index]}] /@ indexesToUse;

>

>

> This is all inside an enclosing Module[] along with some other code, and the

> actual code is a little more complex (e.g. more than three lists, more than

> two derived-value functions).  The derived-value functions themselves are

> mostly simple algebra; I doubt they're the source of the bottleneck, and in

> any case, I can't change them.  (I *can* change the way they're applied,

> though, if it makes a difference.)

>

> I *think* the bottleneck is probably in my repeated calls to Cases[] to find

> particular data points, but that's just a guess.

>

> Is there a more efficient way of doing this that would speed things up

> significantly?

>

> Thanks!

>

>

> Steve W. Brewer

>

>

```

• Prev by Date: Re: More /.{I->-1} craziness
• Next by Date: Re: More /.{I->-1} craziness
• Previous by thread: Re: Combining data from indexed lists efficiently
• Next by thread: inequality as constraints on NDSolve