MathGroup Archive 1998

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Binary Files

  • To: mathgroup at smc.vnet.net
  • Subject: [mg13972] Re: Binary Files
  • From: tgayley at mcs.net (Todd Gayley)
  • Date: Fri, 11 Sep 1998 15:07:00 -0400
  • Organization: MCSNet Services
  • References: <6ssvcd$m93@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

On 5 Sep 1998 23:26:05 -0400, Jason Gill <jgill at vbimail.champlain.edu>
wrote:

>    I am using the following function to read binary files.  I would not
>consider the data files very large, and yet the process is extremely
>slow.  I am running Mathematica 3.0 for win95.  I have installed the
>MathLink program FastReadBinary Files, which help a ton, but it is
>still to slow.  The net is I use this function on multiple files to
>create a three dimensional data matrix with Dimensions on the order of
>[6,1000,300].  Mathematica handles the data O.K., once loaded but the
>loading takes to long.  For the aforementioned matrix,  loading the
>files can take upwards of 15 minutes.  Which seems excessie to me.
> Using the timming function, the largest consumer of time is the actual
>binary read. Does anyone have any suggestions? I also have access to
>Mathematica (3.0)  running on an RS/6000 is there a Mathlink equivalent
>to Fast Read Binary available for this platform? Any suggestions would
>be greatly appreciated.
>
>Options[ReadandConvertLMX]={SetDirPath->"d:\lmxdata\janussq"};
>
>
> ReadandConvertLMX[filename_,opts___]:=
>Module[{inline,setpath,convert6,waste,
>  },
>directory=SetDirPath /. {opts} /. Options[ReadandConvertLMX];
>SetDirectory[directory];
>  inline = OpenReadBinary[filename];
> {modules,parameters,timestep,powerHour}=
>      ReadListBinary[inline,Int16,4,
>ByteOrder->MostSignificantByteFirst];
>waste=ReadListBinary[inline,Byte,120];
>data=ReadListBinary[inline,Byte];
>data=Take[Partition[data,4],(modules*parameters)]; Close[inline];
>ResetDirectory[];
>convert6 = Compile[{b1, b2, b3, b4},  If[b1 < 128,
>    (b2*256^2+b3*256+b4)/256^3  16^(b1-64),
>         -(b2*256^2+b3*256+b4)/256^3 16^(b1-192)]]; ResetDirectory[];
>data=Transpose[Partition[Apply[convert6,data,2],parameters]];
>Dimensions[data]];
>
>--
>Jason Gill
>IBM Microelectronics
>Essex Junction, VT 05452
>Phone   (802)769-3350
>Fax     (802)769-1220
>email:  jgill at vbimail.champlain.edu

Jason,

Your timing seems a little excessive, but it's not clear to me exactly
how many bytes you are reading. On my 180 MHz Pentium Pro, it takes
FastBinaryFiles about 40 sec. to read 10^6 byte-sized numbers. This is
almost entirely the MathLink transfer time--reading the file is very
fast.

You will be happy to know that there is a much, much faster way to read
your data than using the FastBinaryFiles MathLink program. Just use the
built-in ReadList function with the Byte type. I've rewritten your
program using ReadList to get the main data. You still need to get a
few numbers as Int16, so I just use the standard Utilities`BinaryFiles`
package for this. That package has a flaw that causes it to be very
slow, but that isn't an issue for the small header data that you need
to read.

Needs["Utilities`BinaryFiles`"]

ReadandConvertLMX[filename_, opts___] := Module[{inline, setpath,
convert6, waste},
    (* I took out the directory manipulation. I assume that was a
        workaround for a Windows-specific bug in FastBinaryFiles.
        OpenRead searches $Path, which is probably all you need.
    *)
    (* Using the (new in 3.0) DOSTextFormat->False option
       is critical.
    *)
    inline = OpenRead[filename, DOSTextFormat->False];
    {modules,parameters,timestep,powerHour} =
          ReadListBinary[inline, Int16, 4,
              ByteOrder->MostSignificantByteFirst];
    waste=ReadListBinary[inline, Byte, 120];
    (* Use the built-in ReadList function to get the main data.
       Very fast.
    *)
    data=ReadList[inline,Byte];
    Close[inline];
    data=Take[Partition[data,4],(modules*parameters)];
    convert6 = Compile[{b1, b2, b3, b4},  If[b1 < 128,
         (b2*256^2+b3*256+b4)/256^3  16^(b1-64),
          -(b2*256^2+b3*256+b4)/256^3 16^(b1-192)]];
    data = Transpose[Partition[Apply[convert6, data, 2], parameters]];
    Dimensions[data]
]

If this can be done so easily and quickly without using FastBinaryFiles,
then you might ask why FastBinaryFiles was written in the first place.
Two reasons:

1) On Windows, Mathematica simply could not read binary files. It always
opened files in text mode, and thus it always interpreted the bytes in
a special way. In particular, it would read a CR-LF combination as a
single byte, and it stopped at the first 0x1A byte it read, as this is
the EOF indicator in a text file.

2) As mentioned above, the Utilities`BinaryFiles` package has a flaw
that causes it to take astronomically long to read large files.

Motivation 1 was made irrelevant in Mathematica 3.0, which provides the
DOSTextFormat->False option in OpenRead. Motivation 2 is still valid.
FastBinaryFiles is much faster than the Utilities`BinaryFiles` package.
It should be used whenever you need the convenience of a high-level
interface for reading multi-byte types (Int16, Double, etc.) For
reading straight bytes from a file, though, just open the file with
DOSTextFormat->False and use ReadList[Byte].


---Todd Gayley



  • Prev by Date: Abstract data types
  • Next by Date: Black-Scholes and exotic options
  • Previous by thread: Binary Files
  • Next by thread: Mathematica & the Mac--file locations.