Re: out of memory reading large(?) file (Q:)
- To: mathgroup at smc.vnet.net
- Subject: [mg18841] Re: out of memory reading large(?) file (Q:)
- From: sidles at u.washington.edu (John A. Sidles)
- Date: Thu, 22 Jul 1999 22:57:43 -0400
- Organization: University of Washington, Seattle
- References: <7lumlt$lf5@smc.vnet.net> <7m14aj$pfk@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
In article <7m14aj$pfk at smc.vnet.net>, P.J. Hinton <paulh at wolfram.com> wrote: >In article <7lumlt$lf5 at smc.vnet.net>, "iMic" <schaferk at communique.net> writes: > >> i am trying to read in a large file (6.3MB) of zeroes (0.0) and ones (1.0). >> after giving the kernel 50MB, the kernel reports out of memory. surely this >> amount of memory should be enough. the version is Mathematica 3.0 >> on a PowerMac 6500/225 with 96MB RAM. >> i have tried before (on much smaller files), using unformatted binary output >> of the file along with ReadBinary, only to find unbearable long read times >> (> 5Mins). > >How are the zeroes and ones stored in the data file? Are they stored >as bits, or does each byte in the file correspond to a single zero/one? >When you read the data from the file, do you store each entry as an >Integer, Real, or String? Is it really necessary to store all >of the data in the file in memory at once? > >-- >P.J. Hinton >Mathematica Programming Group paulh at wolfram.com >Wolfram Research, Inc. >Disclaimer: Opinions expressed herein are those of the author alone. There is a non-obvious but *very* fast and *very* memory efficient way to read a text file as a list of numbers -- it is an order of magnitude faster than "ReadList", and handles more general file formats to boot. This idiom is well-known to most Mathematica cogniscenti, and is rediscovered every few months by frustrated users --- so I guess it's time for us users to teach it (again) to you guys at Wolfram Inc headquarters! (* Open the file *) stream = OpenRead["fileName here"]; (* Read it in as one long string *) theDataString = ReadList[stream,Record,RecordSeparators->{}][[1]]; (* Convert it to a Mathematica expression *) theDataArray = ( "{{"<>StringReplace[ StringDrop[theDataString,-1], {"e"->"*10^", (* trick for reading exponential notation! *) "\t"->",", "\n"->"},\n{" }] <> "}}") //ToExpression; (* All done! Release the string *) Clear[theDataString]; In comparison to the above, you'll find ReadList[] to be slow, buggy, and a memory hog, to the point that (as the original poster "iMic", and I, and many other users have found) ReadList[] is simply unusable for importing large text files of data. My experience is that customer support invariably suggests using ReadBinary[], which is not an acceptable response, since many applications export only text files. So, I hereby challenge Wolfram Customer Support to come up with any workaround even half as fast as the above idiom for importing big text files. And then put the best idiom you've got up on your web site! By the way, PJ, how come these oft-encountered problems with ReadList[], and their workarounds, are not retained in Wolfram's collective memory from year-to-year? Please point out to your managers that this is a good example of a customer problem that occurs over-and-over again, and gets discussed by the user group each time, but somehow evades solution at the corporate level. Just part of my persistent -- and I hope stimulating to Wolfram management --- campaign for better Mathematica documentation. The above would make a good internal case study at Wolfram Research, as an example of how persistent customer problems with Mathematica have been --- at least in the past --- handled poorly or not-at-all. But hey, do I detect signs that the on-line technical FAQ is being expanded?! Let us hope! Best wishes ... John Sidles