Re: Structure of "identical" data not equal in size

*To*: mathgroup at smc.vnet.net*Subject*: [mg126930] Re: Structure of "identical" data not equal in size*From*: Szabolcs Horvát <szhorvat at gmail.com>*Date*: Mon, 18 Jun 2012 05:45:09 -0400 (EDT)*Delivered-to*: l-mathgroup@mail-archive0.wolfram.com*References*: <jrk2s2$8n0$1@smc.vnet.net>

On 2012.06.17. 10:01, Paul E McHale wrote: > (* First, the internal data and then writing it to CSV *) > m=Table[{i,Sin[i],Cos[i]},{i,1,1000,1.0}]; > SetDirectory[NotebookDirectory[]] > Export["Test2.csv",m, "CSV"] > > (*Reading it back in *) > mIn = Import["Test2.csv", "CSV"] > > (* compare data *) > mIn==m > >>> True > > > (* see memory usage *) > ByteCount[m] >>> 24168 > > ByteCount[mIn] >>> 144040 > > (* what is the minimum size using reals *) > 1000 * 3 * 8 >>> 24000 > > Actual size: 24168 > Read in file data: 144040 > > (* Size *) > ByteCount[mIn]/ByteCount[m] * 1.0 >>> 5.96 > > -------------- > > Why is the "same" data taking up 6x the memory after being written to disk and read back in. This is a serious problem as we have large data at work being shared by files and Mathematica is currently the only language that can't read it (C# being the other language). > > How can they pass for equal yet internally generated takes much less memory? > > Any input welcome as I can only reduce the data with an external editor so I can try to work with it. > The reason is that mIn is not a packed array: Developer`PackedArrayQ[m] (* ==> True *) Developer`PackedArrayQ[mIn] (* ==> False *) I guess the reason Import doesn't return a packed array is that CSV files can hold inhomogeneous data (e.g. both numbers and strings) while packed arrays are always homogeneous. You can try exporting to other formats, perhaps as binary data (which should still be easy to read using C#) Alternatively convert back to a packed array right after you read it in: Developer`ToPackedArray[mIn] Packed arrays give you the same storage efficiency as C#. -- Szabolcs Horvát Visit Mathematica.SE: http://mathematica.stackexchange.com/