Re: Converting date strings to DateList format - a need for speed
- To: mathgroup at smc.vnet.net
- Subject: [mg106542] Re: Converting date strings to DateList format - a need for speed
- From: Albert Retey <awnl at gmx-topmail.de>
- Date: Fri, 15 Jan 2010 07:00:24 -0500 (EST)
- References: <hip88s$sqg$1@smc.vnet.net>
Am 15.01.2010 09:16, schrieb Garapata:
> I have dates in the first column of large flat files (6000+ rows).
> The flat files have a header row and may have as many as 20 columns of
> data besides the dates. A sample of a flat file follows
>
> dataFile = {{"DATES", "DATA1", "DATA2"}, {"12/31/84", 1, 1},
> {"01/01/85", 1,
> 1.00239}, {"01/02/85", 0.999206, 1.00238}, {"01/03/85", 0.997425,
> 1.00238}, {"01/04/85", 0.997038, 1.00237}, {"01/07/85", 0.989256,
> 1.0071}, {"01/08/85", 1.00867, 1.00235}, {"01/09/85", 0.994708,
> 1.00235}, {"01/10/85", 1.00552, 1.00234}}
>
> I want to make a list of just the dates from the first column and turn
> them into an unambiguous DataList[] format for later processing.
>
> This works:
>
> dates = DateList[{#, {"Month", "Day","YearShort"}}] & /@ (DateString
> [#] & /@Rest[dataFile][[All, 1]]);
>
> but, with thousands of dates in a file to convert, it takes a long
> time to run. The Map within a Map seems to slow things down a lot.
> Unless I've missed something, (quite possible) neither DateList[] nor
> DateString seem to operate directly on lists, so I haven't figured out
> a better way to do this.
>
> Can I do anything to make this run faster? Any solutions much
> appreciated.
>
I think the two Maps are not the reason: they are not necessary and you
will find that the code below will do the same thing with only one Map
-- and not be much faster (I have created a longer list of just dates
for my tests):
In[27]:= datelist = Table[DateString[
DatePlus[{1981, 1, 1}, n], {"Month", "/", "Day", "/",
"YearShort"}], {n, 0, 1000}];
In[36]:= Timing[
res1 = DateList[{#, {"Month", "Day", "YearShort"}}] & /@ datelist;]
Out[36]= {4.563, Null}
I think the reason for DateList being rather slow is that it is too much
overhead for "simple" and regular cases like this. Also I believe it
could well be it makes calls to java functions, which also is not a good
idea if you are after speed. The following will only work with dates of
exactly this format but be much faster:
In[37]:= Timing[
res2 = Apply[
{If[#1 > 10, 1900 + #1, 2000 + #1], ##2} &,
ToExpression /@
StringReplace[datelist,
RegularExpression["([0-9]*)/([0-9]*)/([0-9]*)"] ->
"{$3,$1,$2,0,0,0}"],
{1}
];
]
Out[37]= {0.015, Null}
In[38]:= res1 == res2
Out[38]= True
hth,
albert