MathGroup Archive 2012

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Structure of "identical" data not equal in size

  • To: mathgroup at smc.vnet.net
  • Subject: [mg126930] Re: Structure of "identical" data not equal in size
  • From: Szabolcs Horvát <szhorvat at gmail.com>
  • Date: Mon, 18 Jun 2012 05:45:09 -0400 (EDT)
  • Delivered-to: l-mathgroup@mail-archive0.wolfram.com
  • References: <jrk2s2$8n0$1@smc.vnet.net>

On 2012.06.17. 10:01, Paul E McHale wrote:
> (* First, the internal data and then writing it to CSV   *)
> m=Table[{i,Sin[i],Cos[i]},{i,1,1000,1.0}];
> SetDirectory[NotebookDirectory[]]
> Export["Test2.csv",m, "CSV"]
>
> (*Reading it back in *)
> mIn = Import["Test2.csv", "CSV"]
>
> (* compare data *)
> mIn==m
>
>>> True
>
>
> (* see memory usage *)
> ByteCount[m]
>>> 24168
>
> ByteCount[mIn]
>>> 144040
>
> (* what is the minimum size using reals *)
> 1000 * 3 * 8
>>> 24000
>
> Actual size: 24168
> Read in file data: 144040
>
> (* Size *)
> ByteCount[mIn]/ByteCount[m] * 1.0
>>> 5.96
>
> --------------
>
> Why is the "same" data taking up 6x the memory after being written to disk and read back in.  This is a serious problem as we have large data at work being shared by files and Mathematica is currently the only language that can't read it (C# being the other language).
>
> How can they pass for equal yet internally generated takes much less memory?
>
> Any input welcome as I can only reduce the data with an external editor so I can try to work with it.
>

The reason is that mIn is not a packed array:

Developer`PackedArrayQ[m]

(* ==> True *)

Developer`PackedArrayQ[mIn]

(* ==> False *)

I guess the reason Import doesn't return a packed array is that CSV 
files can hold inhomogeneous data (e.g. both numbers and strings) while 
packed arrays are always homogeneous.  You can try exporting to other 
formats, perhaps as binary data (which should still be easy to read 
using C#)

Alternatively convert back to a packed array right after you read it in:

Developer`ToPackedArray[mIn]

Packed arrays give you the same storage efficiency as C#.

-- 
Szabolcs Horvát
Visit Mathematica.SE:  http://mathematica.stackexchange.com/



  • Prev by Date: Re: is Head[] part of the expression?
  • Next by Date: Re: Structure of "identical" data not equal in size
  • Previous by thread: Re: Structure of "identical" data not equal in size
  • Next by thread: Re: Structure of "identical" data not equal in size