MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Converting date strings to DateList format - a need for speed

  • To: mathgroup at smc.vnet.net
  • Subject: [mg106542] Re: Converting date strings to DateList format - a need for speed
  • From: Albert Retey <awnl at gmx-topmail.de>
  • Date: Fri, 15 Jan 2010 07:00:24 -0500 (EST)
  • References: <hip88s$sqg$1@smc.vnet.net>

Am 15.01.2010 09:16, schrieb Garapata:
> I have dates in the first column of large flat files (6000+ rows).
> The flat files have a header row and may have as many as 20 columns of
> data besides the dates.  A sample of a flat file follows
> 
> dataFile = {{"DATES", "DATA1", "DATA2"}, {"12/31/84", 1, 1},
> {"01/01/85", 1,
>   1.00239}, {"01/02/85", 0.999206, 1.00238}, {"01/03/85", 0.997425,
>   1.00238}, {"01/04/85", 0.997038, 1.00237}, {"01/07/85", 0.989256,
>   1.0071}, {"01/08/85", 1.00867, 1.00235}, {"01/09/85", 0.994708,
>   1.00235}, {"01/10/85", 1.00552, 1.00234}}
> 
> I want to make a list of just the dates from the first column and turn
> them into an unambiguous DataList[] format for later processing.
> 
> This works:
> 
> dates = DateList[{#, {"Month", "Day","YearShort"}}] & /@ (DateString
> [#] & /@Rest[dataFile][[All, 1]]);
> 
> but, with thousands of dates in a file to convert, it takes a long
> time to run.  The Map within a Map seems to slow things down a lot.
> Unless I've missed something, (quite possible) neither DateList[] nor
> DateString seem to operate directly on lists, so I haven't figured out
> a better way to do this.
> 
> Can I do anything to make this run faster?  Any solutions much
> appreciated.
> 

I think the two Maps are not the reason: they are not necessary and you
will find that the code below will do the same thing with only one Map
-- and not be much faster (I have created a longer list of just dates
for my tests):

In[27]:= datelist = Table[DateString[
    DatePlus[{1981, 1, 1}, n], {"Month", "/", "Day", "/",
     "YearShort"}], {n, 0, 1000}];

In[36]:= Timing[
 res1 = DateList[{#, {"Month", "Day", "YearShort"}}] & /@ datelist;]

Out[36]= {4.563, Null}

I think the reason for DateList being rather slow is that it is too much
overhead for "simple" and regular cases like this. Also I believe it
could well be it makes calls to java functions, which also is not a good
idea if you are after speed. The following will only work with dates of
exactly this format but be much faster:

In[37]:= Timing[
 res2 = Apply[
    {If[#1 > 10, 1900 + #1, 2000 + #1], ##2} &,
    ToExpression /@
     StringReplace[datelist,
      RegularExpression["([0-9]*)/([0-9]*)/([0-9]*)"] ->
       "{$3,$1,$2,0,0,0}"],
    {1}
    ];
 ]

Out[37]= {0.015, Null}

In[38]:= res1 == res2

Out[38]= True

hth,

albert


  • Prev by Date: Re: Simplify with NestedLessLess?
  • Next by Date: Re: Converting date strings to DateList format - a need
  • Previous by thread: Converting date strings to DateList format - a need for speed
  • Next by thread: Question re I->-I