Re: Converting date strings to DateList format - a need for speed
- To: mathgroup at smc.vnet.net
- Subject: [mg106542] Re: Converting date strings to DateList format - a need for speed
- From: Albert Retey <awnl at gmx-topmail.de>
- Date: Fri, 15 Jan 2010 07:00:24 -0500 (EST)
- References: <hip88s$sqg$1@smc.vnet.net>
Am 15.01.2010 09:16, schrieb Garapata: > I have dates in the first column of large flat files (6000+ rows). > The flat files have a header row and may have as many as 20 columns of > data besides the dates. A sample of a flat file follows > > dataFile = {{"DATES", "DATA1", "DATA2"}, {"12/31/84", 1, 1}, > {"01/01/85", 1, > 1.00239}, {"01/02/85", 0.999206, 1.00238}, {"01/03/85", 0.997425, > 1.00238}, {"01/04/85", 0.997038, 1.00237}, {"01/07/85", 0.989256, > 1.0071}, {"01/08/85", 1.00867, 1.00235}, {"01/09/85", 0.994708, > 1.00235}, {"01/10/85", 1.00552, 1.00234}} > > I want to make a list of just the dates from the first column and turn > them into an unambiguous DataList[] format for later processing. > > This works: > > dates = DateList[{#, {"Month", "Day","YearShort"}}] & /@ (DateString > [#] & /@Rest[dataFile][[All, 1]]); > > but, with thousands of dates in a file to convert, it takes a long > time to run. The Map within a Map seems to slow things down a lot. > Unless I've missed something, (quite possible) neither DateList[] nor > DateString seem to operate directly on lists, so I haven't figured out > a better way to do this. > > Can I do anything to make this run faster? Any solutions much > appreciated. > I think the two Maps are not the reason: they are not necessary and you will find that the code below will do the same thing with only one Map -- and not be much faster (I have created a longer list of just dates for my tests): In[27]:= datelist = Table[DateString[ DatePlus[{1981, 1, 1}, n], {"Month", "/", "Day", "/", "YearShort"}], {n, 0, 1000}]; In[36]:= Timing[ res1 = DateList[{#, {"Month", "Day", "YearShort"}}] & /@ datelist;] Out[36]= {4.563, Null} I think the reason for DateList being rather slow is that it is too much overhead for "simple" and regular cases like this. Also I believe it could well be it makes calls to java functions, which also is not a good idea if you are after speed. The following will only work with dates of exactly this format but be much faster: In[37]:= Timing[ res2 = Apply[ {If[#1 > 10, 1900 + #1, 2000 + #1], ##2} &, ToExpression /@ StringReplace[datelist, RegularExpression["([0-9]*)/([0-9]*)/([0-9]*)"] -> "{$3,$1,$2,0,0,0}"], {1} ]; ] Out[37]= {0.015, Null} In[38]:= res1 == res2 Out[38]= True hth, albert