Inputting integer data in binary form
- To: mathgroup at yoda.physics.unc.edu
- Subject: Inputting integer data in binary form
- From: dmg at oceanus.mitre.org (David M Goblirsch)
- Date: Wed, 20 May 92 07:47:23 EDT
I'm working on a Sun system with Mathematica 1.2. I need to input a file of integer data which I have in binary form, two bytes per integer. Can anyone make any suggestions about how to do this? More generally, are there Mathematica functions or packages for inputting other types of binary files, containing, for example, four byte integers, four byte floating, eight byte floating, etc? I needed this too, and couldn't find a direct Mma function for doing it. So I convert the files to ASCII and then read them using ReadList. One way to do this is to convert each file and keep the ASCII files on disk. This doesn't work for me because I have hundreds of megabytes of speech data, so I want to keep it in binary format. So another way is to create a Mma function which does the translation for you on the fly. Here is a function I use for reading binary short int files: rdxbin[ fileId_String ] := ReadList[ StringJoin[ "!od -vi ", xpath[fileId], ".bin", " | awk '{$1 = \"\" ; print }'" ], Number ] Using ReadList with a string beginning with a ! runs the specified program (in this case the UNIX program "od") and reads the output from that program through a pipe. In the above function, this command is built from pieces using StringJoin. xpath[fileId] is another Mma function I wrote which just returns the full path name of the file I want; fileId is a string just good enough to identify the file relative to some home directory. "od" is the UNIX utility for printing binary files. Do a man on "od", and you'll see that -i is for interpreting the bytes as short ints and -v says to show ALL the data even if it results in repeated lines. (Trust me on this, do NOT forget that v flag!!) Problem: od returns addresses in the first column, so you have to strip them off, hence the step through awk. Finally, the second argument to ReadList is "Number" because, by now, we have a file containing ASCII numbers. It probably would be more efficient to write a C program to do the translation, and perhaps perl could be used instead of awk, but this version works fast enough for my purposes. David M. Goblirsch, The MITRE Corporation, McLean VA 22102 dmgob at mitre.org (703) 883-5450 ions for processing data from UNIX commands, programs, or files. Robby Villegas Knox College (Villegas at Knox.Bitnet)