Re: convert txt to binary file

*To*: mathgroup at smc.vnet.net*Subject*: [mg108983] Re: convert txt to binary file*From*: John Fultz <jfultz at wolfram.com>*Date*: Fri, 9 Apr 2010 03:31:47 -0400 (EDT)

On Thu, 8 Apr 2010 08:02:40 -0400 (EDT), Bill Rowe wrote: > On 4/7/10 at 3:21 AM, jihane.ajaja at mail.mcgill.ca (jihane) wrote: > >> Thank you for all your replies. To give more details about my file: >> it is a file with numerical data, presented in 3 columns for x y and >> z axis. (they are acceleration measurements). My computer is a 64 >> bits machine with 8 GB of RAM. Why my file is that huge? well the >> measurements are done 1000 times per second. I can't ignore parts of >> the data. I need to analyze all of it. I don't think of any other >> useful detail I could provide. What I want to do with this data is >> do some basic calculations and generate a couple of plots. >> > A few thoughts. Your other message indicated each value in your > file is represented with ~16 characters including the character > that acts as the column separator. A bit of experimentation on > my system (also a 64 bit system) with ByteCount indicates > Mathematica stores a single real value using 16 bytes. So, the > 4GB file you have will occupy a bit more than 4GB of internal > memory since I assume the data will be in an array. Mathematica > has some overhead associated with internal storage of arrays in > addition to the storage for each element of the array. > > Since this is more than half the RAM you have available, it is > clear any operation that would make a copy of the array will > fail due to insufficient memory. In fact, when you consider RAM > needed by your operating system, the Mathematica program and > anything else you might have running, it is pretty easy to see > there won't be much you can do with that much data read into RAM. > > Further, even if you could actually create a simple time plot of > all the data after it is read into RAM, the resolution of your > screen and printer is not sufficient to display that much data > at once. For example, my system has a horizontal display width > of 1280 pixels. If I were to plot a data array with say 10,000 > data points, it is clear some of those points must overlap. And > note, 10,000 real triplets would need less than ~500 KB to > store. So, 10,000 data points is clearly a small subset of your data. > > Your only real choices to work with as much data as you've > described, are either to intelligently downsample the data or > break the data up into smaller blocks and work with those. And > note, this really isn't a limitation Mathematica imposes. It is > an inherent limitation of the amount of RAM you have available > and real display sizes. I wouldn't dispute your general remarks about the wisdom of down-sampling. But let me just correct one statement you made. You correctly point out that a Real consumes 16 bytes. But it is not correct to generalize from this that an array from Reals will consume 16 bytes per Real. If the array is constructed in such a way so that the numbers pack, then you'll get 8 bytes per Real, plus a small number of additional bytes for the array as a whole. Some evidence to show this... In[1]:= ByteCount[RandomReal[]] Out[1]= 16 In[2]:= ByteCount[Table[RandomReal[], {1000}]] Out[2]= 8124 Although I'm aware of some of the issues, I'm not the best person to discuss the details of dealing with packed arrays. I will say that the typical experience should be that, as long as you're dealing with uniform data (all Real or all Integer), Mathematica should automatically pack things for you. Of course, packed arrays only throw off your numbers by a factor of two, so I'll reemphasize that this doesn't invalidate your general conclusions about how to tackle the problem. Sincerely, John Fultz jfultz at wolfram.com User Interface Group Wolfram Research, Inc.