Re: ReadList -- file size limits?
- To: mathgroup at smc.vnet.net
- Subject: [mg82878] Re: ReadList -- file size limits?
- From: Aranthon <a.dwarf at gmail.com>
- Date: Fri, 2 Nov 2007 03:27:33 -0500 (EST)
- References: <fg9pau$moq$1@smc.vnet.net><fgcafp$9ml$1@smc.vnet.net>
This gets back to a question I asked a few weeks ago - does Mathematica have any hard-coded file limits? I have an enormous file (results of a simulation - basically a 104 x 108 x 21 x 5500 block of 32-bit floats in binary format) that I'd like to analyze in Mathematica. Obviously, since the file weighs in at about 5 GB, I won't be reading it in whole. But when I try to just open the file, Mathematica immediately tells me that I'm at the end of the file, even before I start reading. I know that the file isn't corrupt, since I can read it perfectly well using another CAS package. It's not a show-stopping point, since I wrote a little C++ program that will let me extract a certain range of steps and save them in a new file. I'm just curious about what's causing the problem, and how large you can go before Mathematica won't even look at the file. Cheers, Greg On Nov 1, 6:40 am, David Bailey <dave at Remove_Thisdbailey.co.uk> wrote: > david.sedar... at forbrf.lth.se wrote: > > Is ReadList limited in some way regarding the amount of data it can read > > in? > > > I have a large data file (~450M text file) and I'm doing something like > > the following: > > > stream = OpenRead[fullname]; (*open file for reading*) > > header = ReadList[stream, String, 1](*read header strings*) > > header = Flatten[StringSplit[header]]; > > data = ReadList[stream, Number, 8(1=D710^6)]);(*read 8 columns of number > > data*) > > Close[stream]; > > > Apparently Readlist doesn't read in the whole file. The code shown above > > runs fine, but data has max length of: 3278320, which corresponds to > > 409790 lines of the data file. Can anyone clue me in as to why this is? > > Is this a suitable approach for reading numbers from a very large ascii > > file? > > > thanks, > > > DS > > First, I would check that your data file is not corrupted in some way. I > have seen data files like this generated by equipment of various sorts > that threw the occasional glitch! > > Anyway, 450M is a really large file, and if Mathematica reads it all in > before processing, that is a very large chunk of memory before it even > starts processing. If you are using a 32-bit operating system (such as > 32-bit Windows) that will have already consumed a fair bit of your > addressable memory. There may also be some hard-coded limits in Mathematica. > > Since ReadList can take a stream argument, which stays open after the > call, you could read the data in chunks, and assemble it afterwards - or > even process it in chunks and avoid holding it all in memory at the same > time. > > An ultimate solution would be to open and read the file in Java (using > J/Link). > > It is hard to be specific without seeing the file - which would probably > be a bit large to append to your message:) > > David Baileyhttp://www.dbaileyconsultancy.co.uk