MathGroup Archive: June 2012 [00239]

[Date Index] [Thread Index] [Author Index]

Re: Structure of "identical" data not equal in size

To: mathgroup at smc.vnet.net
Subject: [mg126929] Re: Structure of "identical" data not equal in size
From: Szabolcs HorvÃt <szhorvat at gmail.com>
Date: Mon, 18 Jun 2012 05:44:48 -0400 (EDT)
Delivered-to: l-mathgroup@mail-archive0.wolfram.com
References: <jrk2s2$8n0$1@smc.vnet.net> <4FDDCBE6.2080300@gmail.com> <1BD15C6A-97F6-4B00-BBE1-C82F9EF85CA6@me.com>

Hi Paul,

If you don't need to import this data in a different program than
Mathematica then I recommend using either the MX format, or the trick with
Compress described e.g. here:

http://mathematica.stackexchange.com/a/1960/12

The problem with MX is that it is not portable between different
systems/architectures, but it's extremely fast to save/load.

The Compress-trick offers reasonable performance while being portable
between systems.

Both preserve packed arrays (without unpacking), but MX requires less
memory to Import/Export.

If you need to exchange data with other programs, BinaryRead/BinaryWrite
may be the way to go.

On 18 June 2012 00:15, Paul E McHale <paulmchale at me.com> wrote:

>
> On Jun 17, 2012, at 8:21 AM, Szabolcs Horv=C3=A1t wrote:
>
> > On 2012.06.17. 10:01, Paul E McHale wrote:
> >> (* First, the internal data and then writing it to CSV   *)
> >> m=Table[{i,Sin[i],Cos[i]},{i,1,1000,1.0}];
> >> SetDirectory[NotebookDirectory[]]
> >> Export["Test2.csv",m, "CSV"]
> >>
> >> (*Reading it back in *)
> >> mIn = Import["Test2.csv", "CSV"]
> >>
> >> (* compare data *)
> >> mIn==m
> >>
> >>>> True
> >>
> >>
> >> (* see memory usage *)
> >> ByteCount[m]
> >>>> 24168
> >>
> >> ByteCount[mIn]
> >>>> 144040
> >>
> >> (* what is the minimum size using reals *)
> >> 1000 * 3 * 8
> >>>> 24000
> >>
> >> Actual size: 24168
> >> Read in file data: 144040
> >>
> >> (* Size *)
> >> ByteCount[mIn]/ByteCount[m] * 1.0
> >>>> 5.96
> >>
> >> --------------
> >>
> >> Why is the "same" data taking up 6x the memory after being written to
> disk and read back in.  This is a serious problem as we have large data a=
t
> work being shared by files and Mathematica is currently the only language
> that can't read it (C# being the other language).
> >>
> >> How can they pass for equal yet internally generated takes much less
> memory?
> >>
> >> Any input welcome as I can only reduce the data with an external edito=
r
> so I can try to work with it.
> >>
> >
> > The reason is that mIn is not a packed array:
> >
> > Developer`PackedArrayQ[m]
> >
> > (* ==> True *)
> >
> > Developer`PackedArrayQ[mIn]
> >
> > (* ==> False *)
> >
> > I guess the reason Import doesn't return a packed array is that CSV
> files can hold inhomogeneous data (e.g. both numbers and strings) while
> packed arrays are always homogeneous.  You can try exporting to other
> formats, perhaps as binary data (which should still be easy to read using
> C#)
> >
> > Alternatively convert back to a packed array right after you read it in=
:
> >
> > Developer`ToPackedArray[mIn]
> >
> > Packed arrays give you the same storage efficiency as C#.
>
>
> ---------------
> Thank you for the reply.  This works acceptably if the data can fit in
> memory as an unpacked array.  It appears there is no reason of the array =
to
> be unpacked other than Mathematica does not pack while importing even if
> the file homogeneous.  I have tried binary with with homogenous
> specifications, {Real64, Real64, Real64}.  Is there any format that will
> read homogenous data in a packed array format?
>
> The unpacked arrays are simply to huge taking over 5x the memory.
>
> Thank you,
> Paul
>
>
>
> >
> > --
> > Szabolcs Horv=C3=A1t
> > Visit Mathematica.SE:  http://mathematica.stackexchange.com/
>
>

Prev by Date: Re: Structure of "identical" data not equal in size

Next by Date: Re: Structure of "identical" data not equal in size

Previous by thread: Re: Structure of "identical" data not equal in size

Next by thread: Re: Structure of "identical" data not equal in size