MathGroup Archive 2007

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: grouping similar list elements with gaps

  • To: mathgroup at
  • Subject: [mg73529] Re: grouping similar list elements with gaps
  • From: "Ray Koopman" <koopman at>
  • Date: Wed, 21 Feb 2007 01:46:08 -0500 (EST)
  • References: <><erbgj9$dsg$>

On Feb 18, 10:37 pm, Stern <nycst... at> wrote:
> At any given time, I care only about the data above the threshold, or
> only about the data below the threshold. There is nothing fuzzy about
> that -- if I'm studying the periods where the variable is over 5, then
> 5.00001 counts just as 500 does. What I am trying to capture in my
> original question is the situation where there is a period over 5,
> then a gap of one or two time units when it slips below 5, then a
> period above 5 again.
> Thanks for any advice,
> Michael
> On 2/18/07, Chris Chiasson <c... at> wrote:
>> how "far above or below" the threshold are you willing to go?
>> On 2/18/07, Stern <nycst... at> wrote:
>>> I work with time series data of the form
>>> {{timecode1,datum1},{timecode2,datum2},...}. The timecodes can be in
>>> any of several formats, but for internal calculations I convert them
>>> to "Mathematica integer" format, which is to say, the absolute number
>>> of seconds since the beginning of January 1, 1900.
>>> My current interest involves continuous runs of dates above or below
>>> a defined threshold. This is relatively easy, using the Split and
>>> Select commands. For example,
>>> Select[Split[TIMESERIESLIST, Sign[#1[[2]] - THRESHOLD] ==
>>>   Sign[#2[[2]] - THRESHOLD] &], (Min[Transpose[#][[2]]] =98 THRESHOLD
>>> ) &]
>>> (Thanks to Bob Hanlon, for suggesting this basic approach).
>>> I would like to generalize this to handle cases where there are small
>>> gaps in the pattern. So, for example, if I am willing to tolerate a
>>> gap of 3, then if list members 3-100 are above the threshold and list
>>> members 102-200 are above the threshold, then the entire period 3-200
>>> is marked as above, though time unit 101 would, on its own, fail.
>>> This may need to be handled recursively, as combined periods above the
>>> threshold may fall close enough together that they should be combined
>>> in turn.
>>> I have thought of some relatively inelegant ways of handling this
>>> ("preprocessing" the time series to create a dummy list in which gaps
>>> have been adjusted over the threshold), but I feel as though there
>>> ought to be a better way to handle it.
>>> Thanks in advance for any help,
>>> Michael
>> --

First append a "wanted" indicator to each {t,d} term,
then split as before.

  x = Split[ Append[#, Last@# > THRESHOLD]& /@ TDLIST,
             Last@#1 == Last@#2 &];

Wanted and unwanted blocks alternate. Merge the wanted blocks that
are separated by TMIN or fewer periods, including the intervening
originally-unwanted blocks. (This is a little klutzy, but it seems
to work.)

  k = 1 + Boole@x[[2,1,3]];
  While[k + 2 <= Length@x,
        If[x[[k,-1,1]] + TMIN < x[[k+2,1,1]], k += 2,
           x = Insert[Drop[x,{k,k+2}],Join@@Take[x,{k,k+2}],k]]];

Finally, select the wanted blocks, and strip the indicators.

  Map[Most, Select[x, #[[1,3]]&], {2}]

  • Prev by Date: Re: Precision issues
  • Next by Date: Re: Creating a Listable Function
  • Previous by thread: Re: grouping similar list elements with gaps
  • Next by thread: EXCEL-Mathematica Link for EXCEL & @Risk