MathGroup Archive 2012

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Structure of "identical" data not equal in size

  • To: mathgroup at smc.vnet.net
  • Subject: [mg126927] Re: Structure of "identical" data not equal in size
  • From: awnl <awnl at gmx-topmail.de>
  • Date: Mon, 18 Jun 2012 05:44:06 -0400 (EDT)
  • Delivered-to: l-mathgroup@mail-archive0.wolfram.com
  • References: <jrk2s2$8n0$1@smc.vnet.net>

Hi,

> (* First, the internal data and then writing it to CSV   *)
> m=Table[{i,Sin[i],Cos[i]},{i,1,1000,1.0}];
> SetDirectory[NotebookDirectory[]] Export["Test2.csv",m, "CSV"]
>
> (*Reading it back in *) mIn = Import["Test2.csv", "CSV"]
>
> (* compare data *) mIn==m
>
>>> True
>
>
> (* see memory usage *) ByteCount[m]
>>> 24168
>
> ByteCount[mIn]
>>> 144040
>
> (* what is the minimum size using reals *) 1000 * 3 * 8
>>> 24000
>
> Actual size: 24168 Read in file data: 144040
>
> (* Size *) ByteCount[mIn]/ByteCount[m] * 1.0
>>> 5.96
>
> --------------
>
> Why is the "same" data taking up 6x the memory after being written to
> disk and read back in.  This is a serious problem as we have large
> data at work being shared by files and Mathematica is currently the
> only language that can't read it (C# being the other language).

The keyworkd is "packed arrays":

Developer`PackedArrayQ[m]
Developer`PackedArrayQ[mIn]

> How can they pass for equal yet internally generated takes much less
> memory?

you can find some information about packed arrays in the documentation 
and a lot more in this newsgroup's archive and also mathematica stack 
exchange.

> Any input welcome as I can only reduce the data with an external
> editor so I can try to work with it.

What you can try is to convert it to a packed array after importing:

mIn = Developer`ToPackedArray[mIn];

unfortunately that doesn't help if the Import fails because there isn't 
enough memory to hold the intermediate expression. You might have to 
read the data line by line (or a chunk of lines at a time) and convert 
it to a packed array as you go along. Can be done but is some effort as 
you have to ensure that the data is never unpacked.

hth,

albert



  • Prev by Date: Re: Structure of "identical" data not equal in size
  • Next by Date: Re: Parenthesis reduction
  • Previous by thread: Re: Structure of "identical" data not equal in size
  • Next by thread: Re: NIntegrate through discontinuities