MathGroup Archive 1999

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: out of memory reading large(?) file (Q:)

  • To: mathgroup at smc.vnet.net
  • Subject: [mg18841] Re: out of memory reading large(?) file (Q:)
  • From: sidles at u.washington.edu (John A. Sidles)
  • Date: Thu, 22 Jul 1999 22:57:43 -0400
  • Organization: University of Washington, Seattle
  • References: <7lumlt$lf5@smc.vnet.net> <7m14aj$pfk@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

In article <7m14aj$pfk at smc.vnet.net>, P.J. Hinton <paulh at wolfram.com> wrote:
>In article <7lumlt$lf5 at smc.vnet.net>, "iMic" <schaferk at communique.net> writes:
>
>> i am trying to read in a large file (6.3MB) of zeroes (0.0) and ones (1.0).
>> after giving the kernel 50MB, the kernel reports out of memory. surely this
>> amount of memory should be enough. the version is Mathematica 3.0 
>> on a PowerMac 6500/225 with 96MB RAM.
>> i have tried before (on much smaller files), using unformatted binary output
>> of the file along with ReadBinary, only to find unbearable long read times
>> (> 5Mins).
>
>How are the zeroes and ones stored in the data file?  Are they stored 
>as bits, or does each byte in the file correspond to a single zero/one?
>When you read the data from the file, do you store each entry as an
>Integer, Real, or String?  Is it really necessary to store all
>of the data in the file in memory at once?
>
>--
>P.J. Hinton	
>Mathematica Programming Group		paulh at wolfram.com
>Wolfram Research, Inc.
>Disclaimer: Opinions expressed herein are those of the author alone.

There is a non-obvious but *very* fast and *very* memory
efficient way to read a text file as a list of numbers -- it
is an order of magnitude faster than "ReadList", and handles
more general file formats to boot.  This idiom is well-known to 
most Mathematica cogniscenti, and is rediscovered every few months
by frustrated users --- so I guess it's time for us users to
teach it (again) to you guys at Wolfram Inc headquarters!

  (* Open the file *)
      stream = OpenRead["fileName here"];
   
  (* Read it in as one long string *)
      theDataString = ReadList[stream,Record,RecordSeparators->{}][[1]];
   
  (* Convert it to a Mathematica expression *)
      theDataArray = (
      "{{"<>StringReplace[
           StringDrop[theDataString,-1],
           {"e"->"*10^",  (* trick for reading exponential notation! *)
            "\t"->",",
            "\n"->"},\n{"
            }] <> "}}") //ToExpression;
   
  (* All done!  Release the string *)
      Clear[theDataString];

In comparison to the above, you'll find ReadList[] to be
slow, buggy, and a memory hog, to the point that (as the
original poster "iMic", and I, and many other users have found)
ReadList[] is simply unusable for importing large text files
of data.  My experience is that customer support invariably
suggests using ReadBinary[], which is not an acceptable
response, since many applications export only text files.

So, I hereby challenge Wolfram Customer Support to come up 
with any workaround even half as fast as the above idiom 
for importing big text files.  And then put the best idiom
you've got up on your web site!

By the way, PJ, how come these oft-encountered problems
with ReadList[], and their workarounds, are not retained in
Wolfram's collective memory from year-to-year? Please point
out to your managers that this is a good example of a
customer problem that occurs over-and-over again, and gets
discussed by the user group each time, but somehow evades
solution at the corporate level.

Just part of my persistent -- and I hope stimulating to
Wolfram management --- campaign for better Mathematica
documentation.  The above would make a good internal case
study at Wolfram Research, as an example of how persistent
customer problems with Mathematica have been --- at least in
the past --- handled poorly or not-at-all.

But hey, do I detect signs that the on-line technical FAQ
is being expanded?!  Let us hope!

Best wishes ... John Sidles



  • Prev by Date: Re: Slow Version 4 Front End
  • Next by Date: Eigensystems
  • Previous by thread: Re: out of memory reading large(?) file (Q:)
  • Next by thread: Q: ReplaceAll with pattern in pattern