Re: ByteCount of imported machine-precision data matrix three times
- To: mathgroup at smc.vnet.net
- Subject: [mg92191] Re: ByteCount of imported machine-precision data matrix three times
- From: Szabolcs Horvát <szhorvat at gmail.com>
- Date: Mon, 22 Sep 2008 07:10:27 -0400 (EDT)
- Organization: University of Bergen
- References: <gb7odj$nij$1@smc.vnet.net>
Gareth Russell wrote: > Hi, > > I am encountering some strange memory-related behavior when importing > numerical data from a file. If anyone is interested, a (small) example > file is here: > > http://web.njit.edu/~russell/Mathematica.html > > It's a simple 2D array of numbers. The issue is that when imported, > ByteCount[] indicates that the resultant expression takes up more than > three times as much memory as an equivalent machine-precision matrix > generated within Mathematica. All diagnostics that I can think of > indicate that the imported expression is equivalent in precision. And > indeed, ByteCount applied to individual elements of each matrix returns > 16 as an answer. It's only the overall ByteCount which is hugely > different. > > I discovered a workaround: if I generate a dummy matrix of 0. elements > (which has the smaller ByteCount), and add it to the imported matrix, > the result, while appearing identical (as it should), now also has the > smaller ByteCount. > > Does anyone know what it going on here? Until I discovered the > workaround it was a problem, as I need to read in a large number of > much larger matrices all together, was encountering memory issues. > In fact not all the numbers are machine precision. Try checking each number in the array: Map[MachineNumberQ, data, {-1}] To reduce the byte count we can first convert all numbers to machine precision, then pack the array: In:= ByteCount[Developer`ToPackedArray@N[data]] Out= 2320 A word of caution about using ByteCount for checking memory use: if we have a list y={x,x,x,x}, its byte count will be four times as large as x's (plus a few bytes), but in fact x may be stored only a *single* time by Mathematica, and y might just reference the same piece of memory four times.