|
[Date Index]
[Thread Index]
[Author Index]
Re: Structure of "identical" data not equal in size
- To: mathgroup at smc.vnet.net
- Subject: [mg126929] Re: Structure of "identical" data not equal in size
- From: Szabolcs HorvÃt <szhorvat at gmail.com>
- Date: Mon, 18 Jun 2012 05:44:48 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <jrk2s2$8n0$1@smc.vnet.net> <4FDDCBE6.2080300@gmail.com> <1BD15C6A-97F6-4B00-BBE1-C82F9EF85CA6@me.com>
Hi Paul,
If you don't need to import this data in a different program than
Mathematica then I recommend using either the MX format, or the trick with
Compress described e.g. here:
http://mathematica.stackexchange.com/a/1960/12
The problem with MX is that it is not portable between different
systems/architectures, but it's extremely fast to save/load.
The Compress-trick offers reasonable performance while being portable
between systems.
Both preserve packed arrays (without unpacking), but MX requires less
memory to Import/Export.
If you need to exchange data with other programs, BinaryRead/BinaryWrite
may be the way to go.
On 18 June 2012 00:15, Paul E McHale <paulmchale at me.com> wrote:
>
> On Jun 17, 2012, at 8:21 AM, Szabolcs Horv=C3=A1t wrote:
>
> > On 2012.06.17. 10:01, Paul E McHale wrote:
> >> (* First, the internal data and then writing it to CSV *)
> >> m=Table[{i,Sin[i],Cos[i]},{i,1,1000,1.0}];
> >> SetDirectory[NotebookDirectory[]]
> >> Export["Test2.csv",m, "CSV"]
> >>
> >> (*Reading it back in *)
> >> mIn = Import["Test2.csv", "CSV"]
> >>
> >> (* compare data *)
> >> mIn==m
> >>
> >>>> True
> >>
> >>
> >> (* see memory usage *)
> >> ByteCount[m]
> >>>> 24168
> >>
> >> ByteCount[mIn]
> >>>> 144040
> >>
> >> (* what is the minimum size using reals *)
> >> 1000 * 3 * 8
> >>>> 24000
> >>
> >> Actual size: 24168
> >> Read in file data: 144040
> >>
> >> (* Size *)
> >> ByteCount[mIn]/ByteCount[m] * 1.0
> >>>> 5.96
> >>
> >> --------------
> >>
> >> Why is the "same" data taking up 6x the memory after being written to
> disk and read back in. This is a serious problem as we have large data a=
t
> work being shared by files and Mathematica is currently the only language
> that can't read it (C# being the other language).
> >>
> >> How can they pass for equal yet internally generated takes much less
> memory?
> >>
> >> Any input welcome as I can only reduce the data with an external edito=
r
> so I can try to work with it.
> >>
> >
> > The reason is that mIn is not a packed array:
> >
> > Developer`PackedArrayQ[m]
> >
> > (* ==> True *)
> >
> > Developer`PackedArrayQ[mIn]
> >
> > (* ==> False *)
> >
> > I guess the reason Import doesn't return a packed array is that CSV
> files can hold inhomogeneous data (e.g. both numbers and strings) while
> packed arrays are always homogeneous. You can try exporting to other
> formats, perhaps as binary data (which should still be easy to read using
> C#)
> >
> > Alternatively convert back to a packed array right after you read it in=
:
> >
> > Developer`ToPackedArray[mIn]
> >
> > Packed arrays give you the same storage efficiency as C#.
>
>
> ---------------
> Thank you for the reply. This works acceptably if the data can fit in
> memory as an unpacked array. It appears there is no reason of the array =
to
> be unpacked other than Mathematica does not pack while importing even if
> the file homogeneous. I have tried binary with with homogenous
> specifications, {Real64, Real64, Real64}. Is there any format that will
> read homogenous data in a packed array format?
>
> The unpacked arrays are simply to huge taking over 5x the memory.
>
> Thank you,
> Paul
>
>
>
> >
> > --
> > Szabolcs Horv=C3=A1t
> > Visit Mathematica.SE: http://mathematica.stackexchange.com/
>
>
Prev by Date:
Re: Structure of "identical" data not equal in size
Next by Date:
Re: Structure of "identical" data not equal in size
Previous by thread:
Re: Structure of "identical" data not equal in size
Next by thread:
Re: Structure of "identical" data not equal in size
|