MathGroup Archive 1998

[Date Index] [Thread Index] [Author Index]

Search the Archive

Reading binary files

  • To: mathgroup at smc.vnet.net
  • Subject: [mg14652] Reading binary files
  • From: jenningsj at mail.utexas.edu (Jim Jennings)
  • Date: Sat, 7 Nov 1998 02:10:01 -0500
  • Organization: University of Texas at Austin
  • Sender: owner-wri-mathgroup at wolfram.com

I recently obtained the MathLink applications MathHDF and FastBinary
from MathSource in my quest to get large data files into Mathematica
faster.  Both packages needed updating; they did not contain PowerMac
native applications, and MathHDF did not work at all on my system:

PowerMac 7100/80
104MB ram, virtual set to 105MB
System 7.5.5
Mathematica 3.0.1

With some tinkering I was able to recompile both packages into PowerMac
native applications using the most recent Mathlink goodies (the
developers kit that came on the Mathematica CD and updated mathlink.h
and SAmprep downloaded from the Wolfram Web site) and CodeWarrior Pro 2
(CodeWarrior IDE 2.1).  The new MathHDF uses HDF 4.1r1.  The new
MathHDF has been submitted to MathSource; the new FastBinary will be
submitted soon.

Here are the results of a simple benchmark reading a large array into
Mathematica using various methods on the computer described above:

ReadListBinary  FastBinary (ppc)                        23 seconds
ReadSDS         MathHDF (ppc)                           28 seconds
ReadListBinary  FastBinary (68k)                        189 seconds
ReadList        built in function reading ascii text    939 seconds
ReadListBinary  standard package Utilities`BinaryFiles` 9597 seconds

The times are elapsed times, not CPU time.  I was careful to not do
anything else with the computer while the benchmarks were running.  The
files contained a 21 by 30 by 300 array of 4 byte real numbers.  The
result for ReadListBinary from the package Utilities`BinaryFiles` is
actually an estimate; a single 30 by 300 slice of the array was read &
the result multiplied by 21.

ReadSDS read from a 741K HDF file containing the 3D array. 
ReadListBinary read from a 741K binary file containing 189,000 numbers
in a single list (except for the Utilities`BinaryFiles` test which read
from a file with 9,000 numbers).  ReadList read a 2.5MB ascii text file
with 189,000 numbers.  The ascii text file was created with
Mathematica; the numbers ranged from 2 to 18 characters long.

The ppc native ReadListBinary and ReadSDS read the file in about the
same time. This is satisfying, but it still seems somewhat slow to me
for files of this size on this computer.  Can anyone explain it?

The ppc native ReadListBinary has a factor of 8 advantage over the 68k
version, which makes sense since the 68k version was running in
emulation.  It looks like it will be worth the trouble to submit the
updated FastBinary to MathSource.

The ppc native ReadListBinary has a factor of 41 advantage over
ReadList.  For those of you wanting to read large files it will be well
worth the trouble to convert your files to binary (or better, make them
that way in the first place) and use FastBinary or MathHDF.

The result for ReadListBinary from Utilities`BinaryFiles` is outrageous!
Did I do something wrong?  Anyone attempting to get faster reads by
converting their ascii files to binary & using this solution will get a
nasty surprise; it will be more than a factor of 10 slower!  Can anyone
explain this seemingly absurd behavior?

-- 
Jim Jennings                                 Research Associate         
jenningsj at mail.utexas.edu Bureau of Economic Geology      (512)
471-4364 (voice) University of Texas at Austin   (512) 471-0140 (fax)


  • Prev by Date: Pb with finding a residue
  • Next by Date: Re: Abs and derivative problems
  • Previous by thread: Re: Pb with finding a residue
  • Next by thread: How to transpose vector?