MathGroup Archive 2007

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Slow Import of CSV files

  • To: mathgroup at smc.vnet.net
  • Subject: [mg79448] Re: Slow Import of CSV files
  • From: j.f.b.payne at tesco.net
  • Date: Fri, 27 Jul 2007 05:46:03 -0400 (EDT)
  • References: <f86ren$ph5$1@smc.vnet.net><f89pti$5iq$1@smc.vnet.net>

Hi Jean-Marc

Thanks for the suggestion.  You are right, it depends on the csv file.
For your example I get

$Version

"6.0 for Microsoft Windows (32-bit) (June 19, 2007)"

Timing[data = Import["c:/temp/myfile.csv"];]

{52.786, Null}

$Version

"5.2 for Microsoft Windows (June 20, 2005)"

Timing[data=Import["c:/temp/myfile.csv"];]

{49.641 Second, Null}

So version 5.2 is quicker on my system, but only marginally (the
difference from your result will be because this is a Pentium III
whereas you, I suppose, have a Pentium IV which has speeded up some
instructions more than others).

However, the relevant result is with a file more like mine, which has
fairly small fixed point numbers (your file has almost all very large
floating point numbers, close to $MaxMachineNumber).

So

data = RandomReal[{0, 1500}, {3 10^5, 3}];

Timing[Export["c:/temp/myfile2.csv", data] ]

(I like / because it doesn't have to be doubled, works on both Linux
and Windows)
The speed of version 6 is not really affected by the difference in the
files

In[1]:= $Version

Out[1]= "6.0 for Microsoft Windows (32-bit) (June 19, 2007)"

In[2]:= Timing[data = Import["c:/temp/myfile2.csv"];]

Out[2]= {51.885, Null}

In[3]:= MaxMemoryUsed[]

Out[3]= 139109800


but version 5.2 is about 2.5x quicker for this file

In[1]:=$Version

Out[1]= "5.2 for Microsoft Windows (June 20, 2005)"

In[2]:=Timing[data=Import["c:/temp/myfile2.csv"];]

Out[2]= {20.239 Second, Null}

In[3]:=MaxMemoryUsed[]

Out[3]= 33143280


So it seems that the improvement in version 6 is that Import speed is
insensitive to number form!

I've now had an update from Technical Support which says
"Most of the differences about the importer are internal. The new
importer
can import much larger files than version 5.2. The amount of memory
taken
to import a large file has been reduced by almost 3 times. Parsing of
dates
has been considerably improved. We can import many more variations
than
before."

On the face of it, the above results show that version 6 uses about 4
times _more_ memory than version 5.2, but maybe there's memory used in
a Java process in 5.2 or something?

In version 6, Import has a "DateStringFormat" option (I _think_ set to
"None" by default, but the Help Table doesn't have headings) whereas
version 5.2 had DateStyle.
It would be nice if there was a way to turn off date import (maybe
"DateStringFormat"->"NoneAtAll" ?) and get the factor ~3 speed
improvement of version 5.2

Regards

John Payne



  • Prev by Date: Re: Re: style question
  • Next by Date: Re: Locator question
  • Previous by thread: Re: Slow Import of CSV files
  • Next by thread: Re: Slow Import of CSV files