Re: Structure of "identical" data not equal in size
- To: mathgroup at smc.vnet.net
- Subject: [mg126927] Re: Structure of "identical" data not equal in size
- From: awnl <awnl at gmx-topmail.de>
- Date: Mon, 18 Jun 2012 05:44:06 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <jrk2s2$8n0$1@smc.vnet.net>
Hi, > (* First, the internal data and then writing it to CSV *) > m=Table[{i,Sin[i],Cos[i]},{i,1,1000,1.0}]; > SetDirectory[NotebookDirectory[]] Export["Test2.csv",m, "CSV"] > > (*Reading it back in *) mIn = Import["Test2.csv", "CSV"] > > (* compare data *) mIn==m > >>> True > > > (* see memory usage *) ByteCount[m] >>> 24168 > > ByteCount[mIn] >>> 144040 > > (* what is the minimum size using reals *) 1000 * 3 * 8 >>> 24000 > > Actual size: 24168 Read in file data: 144040 > > (* Size *) ByteCount[mIn]/ByteCount[m] * 1.0 >>> 5.96 > > -------------- > > Why is the "same" data taking up 6x the memory after being written to > disk and read back in. This is a serious problem as we have large > data at work being shared by files and Mathematica is currently the > only language that can't read it (C# being the other language). The keyworkd is "packed arrays": Developer`PackedArrayQ[m] Developer`PackedArrayQ[mIn] > How can they pass for equal yet internally generated takes much less > memory? you can find some information about packed arrays in the documentation and a lot more in this newsgroup's archive and also mathematica stack exchange. > Any input welcome as I can only reduce the data with an external > editor so I can try to work with it. What you can try is to convert it to a packed array after importing: mIn = Developer`ToPackedArray[mIn]; unfortunately that doesn't help if the Import fails because there isn't enough memory to hold the intermediate expression. You might have to read the data line by line (or a chunk of lines at a time) and convert it to a packed array as you go along. Can be done but is some effort as you have to ensure that the data is never unpacked. hth, albert