Re: Structure of "identical" data not equal in size
- To: mathgroup at smc.vnet.net
- Subject: [mg126929] Re: Structure of "identical" data not equal in size
- From: Szabolcs HorvÃt <szhorvat at gmail.com>
- Date: Mon, 18 Jun 2012 05:44:48 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <jrk2s2$8n0$1@smc.vnet.net> <4FDDCBE6.2080300@gmail.com> <1BD15C6A-97F6-4B00-BBE1-C82F9EF85CA6@me.com>
Hi Paul, If you don't need to import this data in a different program than Mathematica then I recommend using either the MX format, or the trick with Compress described e.g. here: http://mathematica.stackexchange.com/a/1960/12 The problem with MX is that it is not portable between different systems/architectures, but it's extremely fast to save/load. The Compress-trick offers reasonable performance while being portable between systems. Both preserve packed arrays (without unpacking), but MX requires less memory to Import/Export. If you need to exchange data with other programs, BinaryRead/BinaryWrite may be the way to go. On 18 June 2012 00:15, Paul E McHale <paulmchale at me.com> wrote: > > On Jun 17, 2012, at 8:21 AM, Szabolcs Horv=C3=A1t wrote: > > > On 2012.06.17. 10:01, Paul E McHale wrote: > >> (* First, the internal data and then writing it to CSV *) > >> m=Table[{i,Sin[i],Cos[i]},{i,1,1000,1.0}]; > >> SetDirectory[NotebookDirectory[]] > >> Export["Test2.csv",m, "CSV"] > >> > >> (*Reading it back in *) > >> mIn = Import["Test2.csv", "CSV"] > >> > >> (* compare data *) > >> mIn==m > >> > >>>> True > >> > >> > >> (* see memory usage *) > >> ByteCount[m] > >>>> 24168 > >> > >> ByteCount[mIn] > >>>> 144040 > >> > >> (* what is the minimum size using reals *) > >> 1000 * 3 * 8 > >>>> 24000 > >> > >> Actual size: 24168 > >> Read in file data: 144040 > >> > >> (* Size *) > >> ByteCount[mIn]/ByteCount[m] * 1.0 > >>>> 5.96 > >> > >> -------------- > >> > >> Why is the "same" data taking up 6x the memory after being written to > disk and read back in. This is a serious problem as we have large data a= t > work being shared by files and Mathematica is currently the only language > that can't read it (C# being the other language). > >> > >> How can they pass for equal yet internally generated takes much less > memory? > >> > >> Any input welcome as I can only reduce the data with an external edito= r > so I can try to work with it. > >> > > > > The reason is that mIn is not a packed array: > > > > Developer`PackedArrayQ[m] > > > > (* ==> True *) > > > > Developer`PackedArrayQ[mIn] > > > > (* ==> False *) > > > > I guess the reason Import doesn't return a packed array is that CSV > files can hold inhomogeneous data (e.g. both numbers and strings) while > packed arrays are always homogeneous. You can try exporting to other > formats, perhaps as binary data (which should still be easy to read using > C#) > > > > Alternatively convert back to a packed array right after you read it in= : > > > > Developer`ToPackedArray[mIn] > > > > Packed arrays give you the same storage efficiency as C#. > > > --------------- > Thank you for the reply. This works acceptably if the data can fit in > memory as an unpacked array. It appears there is no reason of the array = to > be unpacked other than Mathematica does not pack while importing even if > the file homogeneous. I have tried binary with with homogenous > specifications, {Real64, Real64, Real64}. Is there any format that will > read homogenous data in a packed array format? > > The unpacked arrays are simply to huge taking over 5x the memory. > > Thank you, > Paul > > > > > > > -- > > Szabolcs Horv=C3=A1t > > Visit Mathematica.SE: http://mathematica.stackexchange.com/ > >