Re: Binary Files
- To: mathgroup at smc.vnet.net
- Subject: [mg13972] Re: Binary Files
- From: tgayley at mcs.net (Todd Gayley)
- Date: Fri, 11 Sep 1998 15:07:00 -0400
- Organization: MCSNet Services
- References: <6ssvcd$m93@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
On 5 Sep 1998 23:26:05 -0400, Jason Gill <jgill at vbimail.champlain.edu>
wrote:
> I am using the following function to read binary files. I would not
>consider the data files very large, and yet the process is extremely
>slow. I am running Mathematica 3.0 for win95. I have installed the
>MathLink program FastReadBinary Files, which help a ton, but it is
>still to slow. The net is I use this function on multiple files to
>create a three dimensional data matrix with Dimensions on the order of
>[6,1000,300]. Mathematica handles the data O.K., once loaded but the
>loading takes to long. For the aforementioned matrix, loading the
>files can take upwards of 15 minutes. Which seems excessie to me.
> Using the timming function, the largest consumer of time is the actual
>binary read. Does anyone have any suggestions? I also have access to
>Mathematica (3.0) running on an RS/6000 is there a Mathlink equivalent
>to Fast Read Binary available for this platform? Any suggestions would
>be greatly appreciated.
>
>Options[ReadandConvertLMX]={SetDirPath->"d:\lmxdata\janussq"};
>
>
> ReadandConvertLMX[filename_,opts___]:=
>Module[{inline,setpath,convert6,waste,
> },
>directory=SetDirPath /. {opts} /. Options[ReadandConvertLMX];
>SetDirectory[directory];
> inline = OpenReadBinary[filename];
> {modules,parameters,timestep,powerHour}=
> ReadListBinary[inline,Int16,4,
>ByteOrder->MostSignificantByteFirst];
>waste=ReadListBinary[inline,Byte,120];
>data=ReadListBinary[inline,Byte];
>data=Take[Partition[data,4],(modules*parameters)]; Close[inline];
>ResetDirectory[];
>convert6 = Compile[{b1, b2, b3, b4}, If[b1 < 128,
> (b2*256^2+b3*256+b4)/256^3 16^(b1-64),
> -(b2*256^2+b3*256+b4)/256^3 16^(b1-192)]]; ResetDirectory[];
>data=Transpose[Partition[Apply[convert6,data,2],parameters]];
>Dimensions[data]];
>
>--
>Jason Gill
>IBM Microelectronics
>Essex Junction, VT 05452
>Phone (802)769-3350
>Fax (802)769-1220
>email: jgill at vbimail.champlain.edu
Jason,
Your timing seems a little excessive, but it's not clear to me exactly
how many bytes you are reading. On my 180 MHz Pentium Pro, it takes
FastBinaryFiles about 40 sec. to read 10^6 byte-sized numbers. This is
almost entirely the MathLink transfer time--reading the file is very
fast.
You will be happy to know that there is a much, much faster way to read
your data than using the FastBinaryFiles MathLink program. Just use the
built-in ReadList function with the Byte type. I've rewritten your
program using ReadList to get the main data. You still need to get a
few numbers as Int16, so I just use the standard Utilities`BinaryFiles`
package for this. That package has a flaw that causes it to be very
slow, but that isn't an issue for the small header data that you need
to read.
Needs["Utilities`BinaryFiles`"]
ReadandConvertLMX[filename_, opts___] := Module[{inline, setpath,
convert6, waste},
(* I took out the directory manipulation. I assume that was a
workaround for a Windows-specific bug in FastBinaryFiles.
OpenRead searches $Path, which is probably all you need.
*)
(* Using the (new in 3.0) DOSTextFormat->False option
is critical.
*)
inline = OpenRead[filename, DOSTextFormat->False];
{modules,parameters,timestep,powerHour} =
ReadListBinary[inline, Int16, 4,
ByteOrder->MostSignificantByteFirst];
waste=ReadListBinary[inline, Byte, 120];
(* Use the built-in ReadList function to get the main data.
Very fast.
*)
data=ReadList[inline,Byte];
Close[inline];
data=Take[Partition[data,4],(modules*parameters)];
convert6 = Compile[{b1, b2, b3, b4}, If[b1 < 128,
(b2*256^2+b3*256+b4)/256^3 16^(b1-64),
-(b2*256^2+b3*256+b4)/256^3 16^(b1-192)]];
data = Transpose[Partition[Apply[convert6, data, 2], parameters]];
Dimensions[data]
]
If this can be done so easily and quickly without using FastBinaryFiles,
then you might ask why FastBinaryFiles was written in the first place.
Two reasons:
1) On Windows, Mathematica simply could not read binary files. It always
opened files in text mode, and thus it always interpreted the bytes in
a special way. In particular, it would read a CR-LF combination as a
single byte, and it stopped at the first 0x1A byte it read, as this is
the EOF indicator in a text file.
2) As mentioned above, the Utilities`BinaryFiles` package has a flaw
that causes it to take astronomically long to read large files.
Motivation 1 was made irrelevant in Mathematica 3.0, which provides the
DOSTextFormat->False option in OpenRead. Motivation 2 is still valid.
FastBinaryFiles is much faster than the Utilities`BinaryFiles` package.
It should be used whenever you need the convenience of a high-level
interface for reading multi-byte types (Int16, Double, etc.) For
reading straight bytes from a file, though, just open the file with
DOSTextFormat->False and use ReadList[Byte].
---Todd Gayley