MathGroup Archive 2012

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Structure of "identical" data not equal in size

  • To: mathgroup at smc.vnet.net
  • Subject: [mg126922] Re: Structure of "identical" data not equal in size
  • From: Paul E McHale <paulmchale at me.com>
  • Date: Mon, 18 Jun 2012 05:42:23 -0400 (EDT)
  • Delivered-to: l-mathgroup@mail-archive0.wolfram.com
  • References: <jrk2s2$8n0$1@smc.vnet.net> <4FDDCBE6.2080300@gmail.com>

On Jun 17, 2012, at 8:21 AM, Szabolcs Horv=E1t wrote:

> On 2012.06.17. 10:01, Paul E McHale wrote:
>> (* First, the internal data and then writing it to CSV   *)
>> m=Table[{i,Sin[i],Cos[i]},{i,1,1000,1.0}];
>> SetDirectory[NotebookDirectory[]]
>> Export["Test2.csv",m, "CSV"]
>>
>> (*Reading it back in *)
>> mIn = Import["Test2.csv", "CSV"]
>>
>> (* compare data *)
>> mIn==m
>>
>>>> True
>>
>>
>> (* see memory usage *)
>> ByteCount[m]
>>>> 24168
>>
>> ByteCount[mIn]
>>>> 144040
>>
>> (* what is the minimum size using reals *)
>> 1000 * 3 * 8
>>>> 24000
>>
>> Actual size: 24168
>> Read in file data: 144040
>>
>> (* Size *)
>> ByteCount[mIn]/ByteCount[m] * 1.0
>>>> 5.96
>>
>> --------------
>>
>> Why is the "same" data taking up 6x the memory after being written to disk and read back in.  This is a serious problem as we have large data at work being shared by files and Mathematica is currently the only language that can't read it (C# being the other language).
>>
>> How can they pass for equal yet internally generated takes much less memory?
>>
>> Any input welcome as I can only reduce the data with an external editor so I can try to work with it.
>>
>
> The reason is that mIn is not a packed array:
>
> Developer`PackedArrayQ[m]
>
> (* ==> True *)
>
> Developer`PackedArrayQ[mIn]
>
> (* ==> False *)
>
> I guess the reason Import doesn't return a packed array is that CSV files can hold inhomogeneous data (e.g. both numbers and strings) while packed arrays are always homogeneous.  You can try exporting to other formats, perhaps as binary data (which should still be easy to read using C#)
>
> Alternatively convert back to a packed array right after you read it in:
>
> Developer`ToPackedArray[mIn]
>
> Packed arrays give you the same storage efficiency as C#.


---------------
Thank you for the reply.  This works acceptably if the data can fit in memory as an unpacked array.  It appears there is no reason of the array to be unpacked other than Mathematica does not pack while importing even if the file homogeneous.  I have tried binary with with homogenous specifications, {Real64, Real64, Real64}.  Is there any format that will read homogenous data in a packed array format?

The unpacked arrays are simply to huge taking over 5x the memory.

Thank you,
Paul



>
> --
> Szabolcs Horv=E1t
> Visit Mathematica.SE:  http://mathematica.stackexchange.com/




  • Prev by Date: Re: Boundary condition of diffusion equation in a sphere
  • Next by Date: Re: power of logistic distribution
  • Previous by thread: Structure of "identical" data not equal in size
  • Next by thread: Re: Structure of "identical" data not equal in size