MathGroup Archive: March 2005 [00118]

[Date Index] [Thread Index] [Author Index]
Re: Rearranging a data array containing calendrical as well as data entries.
To: mathgroup at smc.vnet.net
Subject: [mg54858] Re: [mg54827] Rearranging a data array containing calendrical as well as data entries.
From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
Date: Fri, 4 Mar 2005 05:07:53 -0500 (EST)
Sender: owner-wri-mathgroup at wolfram.com
>-----Original Message-----
>From: Gilmar [mailto:gilmar.rodriguez at nwfwmd.state.fl.us] 
To: mathgroup at smc.vnet.net
>Sent: Thursday, March 03, 2005 4:29 AM
>Subject: [mg54858] [mg54827] Rearranging a data array containing 
>calendrical as well as data entries.
>
>Dear Mathematica User Friends:
>
>I have a file containing flow data from the USGS, in the following
>format:
>
>1999 1 1 489.82 489.82 495.01 495.01 495.01 495.01 495.01 490.51
>1999 1 2 490.51 490.51 490.51 490.51 490.38 490.38 490.38 490.38
>1999 1 3 490.38 510.38 510.38 510.38 510.38 510.38 528.66 528.66
>1999 1 4 528.66 528.66 528.66 501.68 501.68 501.68 501.68
>1999 2 1 501.68 496.44 496.44 496.44 496.44 496.44 478.72 478.72
>1999 2 2 478.72 478.72 478.72 452.82 452.82 452.82 452.82 452.82
>1999 2 3 450.19 450.19 450.19 450.19 450.19 443.98 443.98 443.98
>1999 2 4 443.98 443.98 440.14 440.14
>1999 3 1 440.14 440.14 440.14 453.64 453.64 453.64 453.64 453.64
>1999 3 2 503.98 503.98 503.98 503.98 503.98 500.84 500.84 500.84
>1999 3 3 500.84 500.84 473.48 473.48 473.48 473.48 473.48 463.19
>1999 3 4 463.19 463.19 463.19 463.19 457.54 457.54 457.54
>
>This format is used by the USGS to compress their data records.
>
>Each row contains:
>Year, Month Number(1 to 12), Row Number (1 to 4), and data entries.
>
>The first row:
>1999 1 1 489.82 489.82 495.01 495.01 495.01 495.01 495.01 490.51
>contains flow values corresponding to: January 1 to January 8,
>of the year 1999.
>
>The second row:
>1999 1 2 490.51 490.51 490.51 490.51 490.38 490.38 490.38 490.38
>contains flow values corresponding to: January 9 to January 16,
>of the year 1999.
>
>The third row:
>1999 1 3 490.38 510.38 510.38 510.38 510.38 510.38 528.66 528.66
>contains flow values corresponding to: January 17 to January 24,
>of the year 1999.
>
>The fourth row:
>1999 1 4 528.66 528.66 528.66 501.68 501.68 501.68 501.68
>contains flow values corresponding to: January 25 to January 31,
>of the year 1999.
>
>I think that you get the picture of how this data set is assembled.
>
>What I need is a program that can turn the above mentioned horizontal
>array, into a simple vertical array, containing two columns;
>the first column contains the dates when the data was collected,
>and the second column contains the flow values; i.e.
>
>01Jan1999 489.82
>02Jan1999 489.82
>03Jan1999 495.01
>etc.
>
>If I give the program a starting date, and ending date for an
>arbitrary record; the program should be able to allocate two
>arrays to:
>
>(1.) put the dates between the starting date, and ending date,
>to form the first column of the vertical array.
>
>(2.)match correctly those dates with the data to appear in
>the second column of the vertical array.
>
>The program should discern between regular years, and leap
>years.  Those of you that are still using FORTRAN, and have
>experienced how difficult it is to deal with date functions
>using FORTRAN, might sympathize with my request.
>
>P.S. To get a larger set of USGS flow data to test your program
>please download the following file:
>
>http://www.gilmarlily.netfirms.com/download/flow.dat
>
>Thank you for your help!
>
>

Gilmar,

(1) you need not ponder about calendars, as the program will be driven
by the data, assuming the USGS knows the calendar!

(2) your intended format is not well suited for date selection, so if
you need that at all, apply that only at the final output.

(3) from your data format in the file, the year and month are not always
separated, and such is not so well apt for automatic parsing at Read.
Such we decode in an extra step.


 
In[1]:= susgs = OpenRead["c:\\temp\\flow.dat"]

In[2]:= type = Table[Number, {#}] & /@ {3, 8};

In[3]:= Clear[rr]; c = 0;
In[4]:=
While[(rec = Read[susgs, Record]) =!= EndOfFile,
  rr[++c] = Read[StringToStream[StringInsert[rec, " ", 5]], type]]

In[5]:= Close[susgs]

We read each record, insert the (sometimes) missing separator after the
year, decode it with the structure given by type (thus separating the
description from the data) and assign to rr[record-number].

Out[5]= "c:\\temp\\flow.dat"

In[6]:= rr2 = DeleteCases[Array[rr, {c}], EndOfFile, {3}];

Where the record was "too short" EndOfFile had been inserted, which we
delete now, and put all stuff into a single list rr2.


Let's look at a sample:

In[7]:= Take[rr2, {24, 25}]
Out[7]=
{{{1999, 2, 4}, {443.98, 443.98, 440.14, 440.14}}, {{1999, 3, 1},
{440.14, 
      440.14, 440.14, 453.64, 453.64, 453.64, 453.64, 453.64}}}


Now we generate a useful format, calculating the day of month:

In[8]:=
rr3 = Join @@ 
      Function[{rhead, rdata}, 
          MapIndexed[{ReplacePart[rhead, 
                  First[#2] + (rhead[[3]] - 1)*8, 3], #1} &, rdata]]
      @@@ rr2;



We might take a range from the date ...

In[9]:= Take[rr3, {178, 189}]

... but we won't calculate the indices (then we would have to know the
calender!), 
instead we simply extract the date in a given range:

In[10]:=
dd = Cases[rr3, {datum_, _} /; OrderedQ[{#1, datum, #2}]] &[
       {1999, 2, 25}, {1999, 3, 5}]
Out[10]=
{{{1999, 2, 25}, 443.98}, {{1999, 2, 26}, 443.98}, {{1999, 2, 27}, 
    440.14}, {{1999, 2, 28}, 440.14}, {{1999, 3, 1}, 440.14}, {{1999, 3,
2}, 
    440.14}, {{1999, 3, 3}, 440.14}, {{1999, 3, 4}, 453.64}, {{1999, 3,
5}, 
    453.64}}


If you now want your special format, we have to know the names of the
months:

In[11]:=
mnames = {"Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep",

      "Oct", "Nov", "Dec"};

In[12]:=
{StringDrop[
          ToString[
              NumberForm[#1[[3]], 2, 
                NumberPadding -> {"0", ""}, SignPadding -> True, 
                NumberSigns -> {" ", " "}]] <> 
            mnames[[#1[[2]]]] <> 
            ToString[#1[[1]]], 
          1], #2} & @@@ dd // TableForm


Out[12]//TableForm=
	25Feb1999	443.98
      26Feb1999	443.98
      27Feb1999	440.14
      28Feb1999	440.14
      01Mar1999	440.14
      02Mar1999	440.14
      03Mar1999	440.14
      04Mar1999	453.64
      05Mar1999	453.64



If your file is really gigantic, and you only want a short part,
calculate from the start date of the segment intended a reasonable
search string and locate that in the file, then make a guess for the
number of records neaded and read them. This can be done in a rather
crude fashion, e.g.:

In[13]:= startDate = {1999, 2, 25};
         endDate = {1999, 3, 5};

In[15]:= startString = ToString[startDate[[1]]]
Out[15]= "1999"

In[16]:= norecs = (endDate[[1]] - startDate[[1]] + 1)*12*4
Out[16]= 48
 


In[17]:= susgs = OpenRead["c:\\temp\\flow.dat"]

In[18]:= rec = Find[susgs, startString]
Out[18]=
"1999 1 1 489.82 489.82 495.01 495.01 495.01 495.01 495.01 490.51"

In[19]:= Clear[rr]

In[20]:=
rr[c = 1] = Read[StringToStream[StringInsert[rec, " ", 5]], type]
Out[20]=
{{1999, 1, 1}, {489.82, 489.82, 495.01, 495.01, 495.01, 495.01, 495.01, 
    490.51}}

In[21]:=
While[(rec = Read[susgs, Record]) =!= EndOfFile && c < norecs,
  rr[++c] = Read[StringToStream[StringInsert[rec, " ", 5]], type]]

In[22]:= Close[susgs]


In[23]:= rr2 = .....

continue as usual.


--
Hartmut Wolf
Prev by Date: Re: Rearranging a data array containing calendrical as well as data entries.
Next by Date: Re: Help Mathematica - Generating Patterns
Previous by thread: Re: Rearranging a data array containing calendrical as well as data entries.
Next by thread: Re: Rearranging a data array containing calendrical as well as data entries.