Re: Structure of "identical" data not equal in size
- To: mathgroup at smc.vnet.net
- Subject: [mg126927] Re: Structure of "identical" data not equal in size
- From: awnl <awnl at gmx-topmail.de>
- Date: Mon, 18 Jun 2012 05:44:06 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <jrk2s2$8n0$1@smc.vnet.net>
Hi,
> (* First, the internal data and then writing it to CSV *)
> m=Table[{i,Sin[i],Cos[i]},{i,1,1000,1.0}];
> SetDirectory[NotebookDirectory[]] Export["Test2.csv",m, "CSV"]
>
> (*Reading it back in *) mIn = Import["Test2.csv", "CSV"]
>
> (* compare data *) mIn==m
>
>>> True
>
>
> (* see memory usage *) ByteCount[m]
>>> 24168
>
> ByteCount[mIn]
>>> 144040
>
> (* what is the minimum size using reals *) 1000 * 3 * 8
>>> 24000
>
> Actual size: 24168 Read in file data: 144040
>
> (* Size *) ByteCount[mIn]/ByteCount[m] * 1.0
>>> 5.96
>
> --------------
>
> Why is the "same" data taking up 6x the memory after being written to
> disk and read back in. This is a serious problem as we have large
> data at work being shared by files and Mathematica is currently the
> only language that can't read it (C# being the other language).
The keyworkd is "packed arrays":
Developer`PackedArrayQ[m]
Developer`PackedArrayQ[mIn]
> How can they pass for equal yet internally generated takes much less
> memory?
you can find some information about packed arrays in the documentation
and a lot more in this newsgroup's archive and also mathematica stack
exchange.
> Any input welcome as I can only reduce the data with an external
> editor so I can try to work with it.
What you can try is to convert it to a packed array after importing:
mIn = Developer`ToPackedArray[mIn];
unfortunately that doesn't help if the Import fails because there isn't
enough memory to hold the intermediate expression. You might have to
read the data line by line (or a chunk of lines at a time) and convert
it to a packed array as you go along. Can be done but is some effort as
you have to ensure that the data is never unpacked.
hth,
albert