MathGroup Archive 2007

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: ReadList -- file size limits?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg82859] Re: ReadList -- file size limits?
  • From: David Bailey <dave at Remove_Thisdbailey.co.uk>
  • Date: Thu, 1 Nov 2007 05:20:50 -0500 (EST)
  • References: <fg9pau$moq$1@smc.vnet.net>

david.sedarsky at forbrf.lth.se wrote:
> Is ReadList limited in some way regarding the amount of data it can read 
> in?
> 
> I have a large data file (~450M text file) and I'm doing something like 
> the following:
> 
> stream = OpenRead[fullname]; (*open file for reading*)
> header = ReadList[stream, String, 1](*read header strings*)
> header = Flatten[StringSplit[header]];
> data = ReadList[stream, Number, 8(1=D710^6)]);(*read 8 columns of number 
> data*)
> Close[stream];
> 
> Apparently Readlist doesn't read in the whole file.  The code shown above
> runs fine, but data has max length of: 3278320, which corresponds to 
> 409790 lines of the data file.  Can anyone clue me in as to why this is? 
> Is this a suitable approach for reading numbers from a very large ascii 
> file?
> 
> thanks,
> 
> DS
> 
First, I would check that your data file is not corrupted in some way. I 
have seen data files like this generated by equipment of various sorts 
that threw the occasional glitch!

Anyway, 450M is a really large file, and if Mathematica reads it all in 
before processing, that is a very large chunk of memory before it even 
starts processing. If you are using a 32-bit operating system (such as 
32-bit Windows) that will have already consumed a fair bit of your 
addressable memory. There may also be some hard-coded limits in Mathematica.

Since ReadList can take a stream argument, which stays open after the 
call, you could read the data in chunks, and assemble it afterwards - or 
even process it in chunks and avoid holding it all in memory at the same 
time.

An ultimate solution would be to open and read the file in Java (using 
J/Link).

It is hard to be specific without seeing the file - which would probably 
be a bit large to append to your message:)

David Bailey
http://www.dbaileyconsultancy.co.uk


  • Prev by Date: Re: Re: Setting Negatives to Zero
  • Next by Date: Re: Re: Setting Negatives to Zero
  • Previous by thread: Re: Frequency response function
  • Next by thread: Re: ReadList -- file size limits?