Re: Binary Files
- To: mathgroup at smc.vnet.net
- Subject: [mg13972] Re: Binary Files
- From: tgayley at mcs.net (Todd Gayley)
- Date: Fri, 11 Sep 1998 15:07:00 -0400
- Organization: MCSNet Services
- References: <6ssvcd$m93@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
On 5 Sep 1998 23:26:05 -0400, Jason Gill <jgill at vbimail.champlain.edu> wrote: > I am using the following function to read binary files. I would not >consider the data files very large, and yet the process is extremely >slow. I am running Mathematica 3.0 for win95. I have installed the >MathLink program FastReadBinary Files, which help a ton, but it is >still to slow. The net is I use this function on multiple files to >create a three dimensional data matrix with Dimensions on the order of >[6,1000,300]. Mathematica handles the data O.K., once loaded but the >loading takes to long. For the aforementioned matrix, loading the >files can take upwards of 15 minutes. Which seems excessie to me. > Using the timming function, the largest consumer of time is the actual >binary read. Does anyone have any suggestions? I also have access to >Mathematica (3.0) running on an RS/6000 is there a Mathlink equivalent >to Fast Read Binary available for this platform? Any suggestions would >be greatly appreciated. > >Options[ReadandConvertLMX]={SetDirPath->"d:\lmxdata\janussq"}; > > > ReadandConvertLMX[filename_,opts___]:= >Module[{inline,setpath,convert6,waste, > }, >directory=SetDirPath /. {opts} /. Options[ReadandConvertLMX]; >SetDirectory[directory]; > inline = OpenReadBinary[filename]; > {modules,parameters,timestep,powerHour}= > ReadListBinary[inline,Int16,4, >ByteOrder->MostSignificantByteFirst]; >waste=ReadListBinary[inline,Byte,120]; >data=ReadListBinary[inline,Byte]; >data=Take[Partition[data,4],(modules*parameters)]; Close[inline]; >ResetDirectory[]; >convert6 = Compile[{b1, b2, b3, b4}, If[b1 < 128, > (b2*256^2+b3*256+b4)/256^3 16^(b1-64), > -(b2*256^2+b3*256+b4)/256^3 16^(b1-192)]]; ResetDirectory[]; >data=Transpose[Partition[Apply[convert6,data,2],parameters]]; >Dimensions[data]]; > >-- >Jason Gill >IBM Microelectronics >Essex Junction, VT 05452 >Phone (802)769-3350 >Fax (802)769-1220 >email: jgill at vbimail.champlain.edu Jason, Your timing seems a little excessive, but it's not clear to me exactly how many bytes you are reading. On my 180 MHz Pentium Pro, it takes FastBinaryFiles about 40 sec. to read 10^6 byte-sized numbers. This is almost entirely the MathLink transfer time--reading the file is very fast. You will be happy to know that there is a much, much faster way to read your data than using the FastBinaryFiles MathLink program. Just use the built-in ReadList function with the Byte type. I've rewritten your program using ReadList to get the main data. You still need to get a few numbers as Int16, so I just use the standard Utilities`BinaryFiles` package for this. That package has a flaw that causes it to be very slow, but that isn't an issue for the small header data that you need to read. Needs["Utilities`BinaryFiles`"] ReadandConvertLMX[filename_, opts___] := Module[{inline, setpath, convert6, waste}, (* I took out the directory manipulation. I assume that was a workaround for a Windows-specific bug in FastBinaryFiles. OpenRead searches $Path, which is probably all you need. *) (* Using the (new in 3.0) DOSTextFormat->False option is critical. *) inline = OpenRead[filename, DOSTextFormat->False]; {modules,parameters,timestep,powerHour} = ReadListBinary[inline, Int16, 4, ByteOrder->MostSignificantByteFirst]; waste=ReadListBinary[inline, Byte, 120]; (* Use the built-in ReadList function to get the main data. Very fast. *) data=ReadList[inline,Byte]; Close[inline]; data=Take[Partition[data,4],(modules*parameters)]; convert6 = Compile[{b1, b2, b3, b4}, If[b1 < 128, (b2*256^2+b3*256+b4)/256^3 16^(b1-64), -(b2*256^2+b3*256+b4)/256^3 16^(b1-192)]]; data = Transpose[Partition[Apply[convert6, data, 2], parameters]]; Dimensions[data] ] If this can be done so easily and quickly without using FastBinaryFiles, then you might ask why FastBinaryFiles was written in the first place. Two reasons: 1) On Windows, Mathematica simply could not read binary files. It always opened files in text mode, and thus it always interpreted the bytes in a special way. In particular, it would read a CR-LF combination as a single byte, and it stopped at the first 0x1A byte it read, as this is the EOF indicator in a text file. 2) As mentioned above, the Utilities`BinaryFiles` package has a flaw that causes it to take astronomically long to read large files. Motivation 1 was made irrelevant in Mathematica 3.0, which provides the DOSTextFormat->False option in OpenRead. Motivation 2 is still valid. FastBinaryFiles is much faster than the Utilities`BinaryFiles` package. It should be used whenever you need the convenience of a high-level interface for reading multi-byte types (Int16, Double, etc.) For reading straight bytes from a file, though, just open the file with DOSTextFormat->False and use ReadList[Byte]. ---Todd Gayley