MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Combining data from indexed lists efficiently

  • To: mathgroup at smc.vnet.net
  • Subject: [mg106182] Combining data from indexed lists efficiently
  • From: "Steve W. Brewer" <steve at take5.org>
  • Date: Mon, 4 Jan 2010 05:59:52 -0500 (EST)

I have several lists of the format:

    { {index1, value}, {index2, value}, ... {indexN, value} }

For example:

    list1 = { {"A", 1}, {"B",  2}, {"C",  3}, {"D", 4} }
    list2 = { {"A", 5}, {"B",  6}, {"D",  7}, {"E", 8} }
    list3 = { {"A", 9}, {"B", 10}, {"C", 11} }

The indexes are not necessarily strings; they may be any expression.  (In
the specific case I'm addressing now, each index is a list representing a
date/time in the format returned by DateList[].)  The lists are not
necessarily the same length.  Also, while most of the indexes appear in all
lists, there are some holes (missing data).

I want to combine the lists into a single list of the format:

    { { index1, {value1, value2, ... valueN} },
      { index2, {value1, value2, ... valueN} },
      ...
      { indexN, {value1, value2, ... valueN} } }

Only the data points with indexes appearing in all lists should be included;
the rest should be dropped.  Also, I want to include some derived values
along with the original data values.

Using the sample data above, let's say I want to include two derived values
from the functions:

    f1[list1Data_, list2Data_] := list1Data + list2Data
    f2[list2Data_, list3Data_] := list2Data + list3Data

The result would be:

    combinedList = { { "A", {1, 5,  9, 6, 14} },
                     { "B", {2, 6, 10, 8, 16} } }

I have a solution that works fine on "small" data sets. However, it's
impractically slow on the "large" data sets I really need to run it on (over
100k elements in each list).

Here's what I'm doing now:


    (* This part executes pretty quickly *)

    indexesToUse =
        Intersection[First /@ list1, First /@ list2, First /@ list3];

    valueAtIndex[index_, list_] :=
        Cases[list, {index, _}, 1, 1] // First // Last;

    dataAtIndex[index_] := Block[
        {v1, v2, v3, vf1, vf2},

        v1 = valueAtIndex[index, list1];
        v2 = valueAtIndex[index, list2];
        v3 = valueAtIndex[index, list3];

        vf1 = f1[v1, v2];
        vf2 = f2[v2, v3];

        {v1, v2, v3, vf1, vf2}
    ];

    (* This is where it bogs down *)

    combinedList =
        Function[{index}, {index, dataAtIndex[index]}] /@ indexesToUse;


This is all inside an enclosing Module[] along with some other code, and the
actual code is a little more complex (e.g. more than three lists, more than
two derived-value functions).  The derived-value functions themselves are
mostly simple algebra; I doubt they're the source of the bottleneck, and in
any case, I can't change them.  (I *can* change the way they're applied,
though, if it makes a difference.)

I *think* the bottleneck is probably in my repeated calls to Cases[] to find
particular data points, but that's just a guess.

Is there a more efficient way of doing this that would speed things up
significantly?

Thanks!


Steve W. Brewer



  • Prev by Date: Re: Re: More /.{I->-1} craziness
  • Next by Date: Re: Re: More /.{I->-1} craziness
  • Previous by thread: Re: "more kernels than you are licensed"
  • Next by thread: Re: Combining data from indexed lists efficiently