MathGroup Archive 2007

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: how to quickly read a >10MB big file

  • To: mathgroup at smc.vnet.net
  • Subject: [mg72925] Re: how to quickly read a >10MB big file
  • From: Bill Rowe <readnewsciv at sbcglobal.net>
  • Date: Thu, 25 Jan 2007 07:15:31 -0500 (EST)

>The format of the file is that five note lines followed by a block
>of data (6 columns * 100000 lines). It looks like as below:
>--------------------------------------------------------------
>The file was generated on Jan-01-2007

>ParameterA=0.20998977 ParameterB=-2323.898780 ParameterC=1223

>the full output is:

>-7.9777019460E-03  5.8979296313E-03 -5.8992690654E-02
>-1.9555038170E-03 -0.2143438800      0.9835566699 9.5788225640E-02
>-1.6666155312E-02 -2.3570413269E-02 8.4937134986E-04 
>-0.1289696421 0.9813171342 6.7266728621E-02 -2.7685295289E-02 
>4.8717250310E-02 1.5101454940E-02 -0.1758737132      0.9917945596
>... ... ...
>--------------------------------------------------------------

>My PC has a Pentium4 CPU and 512MB memory. I have used "Import"
>(using type Table), "ReadList" and "FindList", but all of them were
>very slow.

Import will clearly read the data but is slow because it does a 
lot of checking of data types for you. This is required to allow 
Import to deal with mixed data types.

FindList is not intended to read in large data files unless you 
want to work with strings. And even then it is more efficient to 
use ReadList or Read if you are going to read the entire data file.

The simplest way to efficiently read such a large file is to use 
an editor to put the five lines of notes in a separate file 
leaving just numbers to be read in by ReadList. Then, the time 
to read the data should be as quick as your machine can manage.

But you indicate your machine doesn't have much memory for a 
modern operating system. I suspect with large files you will 
find Mathematica has to use virtual memory which will definitely 
slow things down.

Here is the output of a fresh session where I used Mathematica 
on my machine to read in a large data file;

In[1]:=
a=MemoryInUse[];

In[2]:=
Timing[data=ReadList["/
       Users/browe/Desktop/IOC Analysis/Vacuum Test II Data/Edge 
Data \
III.txt",Number];]

Out[2]=
{13.0222 Second,Null}

In[3]:=
MemoryInUse[]-a

Out[3]=
129422112

In[4]:=
Length@data

Out[4]=
7636380

As you can see, I read in 760,000 numbers in about 13 seconds. 
But also note, this resulted in an increase in the amount of RAM 
Mathematica used by ~130 MB. For me, this is not an issue since 
I have 2 GB of RAM installed on my machine. I am quite certain 
if I could even run Mathematica with everything else in 512 MB, 
reading this much data in on the same machine would take much 
more than 13 seconds.
--
To reply via email subtract one hundred and four


  • Prev by Date: when draw the piechart,how to write the label outside the piechart
  • Next by Date: Re: How do quickest
  • Previous by thread: Re: how to quickly read a >10MB big file
  • Next by thread: Help with plotting and iterations