Services & Resources / Wolfram Forums / MathGroup Archive
-----

MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Loading portion of large HDF5 array?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg114126] Re: Loading portion of large HDF5 array?
  • From: Paul <pnorthug at gmail.com>
  • Date: Wed, 24 Nov 2010 07:00:16 -0500 (EST)
  • References: <icdoch$6i8$1@smc.vnet.net> <icg6pq$97i$1@smc.vnet.net>

On Nov 23, 2:59 am, Paul <pnort... at gmail.com> wrote:
> On Nov 22, 4:40 am, Bill Rowe <readn... at sbcglobal.net> wrote:
>
> > On 11/20/10 at 6:27 PM, pnort... at gmail.com (Paul) wrote:
>
> > >I have a large matrix (>10gb) in an HDF5 file.
> > >Is there a way to read only a portion of this matrix using Import[]
> > >and the HDF5 import format?
>
> > Yes. You can read various portions of the file. See
>
> > ref/format/HDF5
>
> > in the DocumentCenter for details
>
> A specific example is below (snipped output from h5ls -vlr) with
> matrix '/data' with dimensions ~ {10^9, 51}. How would I read in the
> first 1000 rows, the next 1000? Thanks for the documentation pointer
> but I didn't find any way to do this. I understand you can load in
> datasets separately but maybe not a portion of a single dataset.
>
> Import["file.h5', {"Datasets", "/data"}] attempts to load the full
> matrix.
>
> /data                    Dataset {110945492/Inf, 51/5=
1}
>     Location:  1:800
>     Links:     1
>     Chunks:    {1000, 51} 204000 bytes
>     Storage:   1158043888 logical bytes, 3840201966 allocated bytes=
,
> 107.67% utilization
>     Filter-0:  deflate-1 OPT {4}
>     Type:      IEEE 32-bit little-endian float

To do this, I wrote a cython mathlink function that calls h5py, a
python wrapper of the hdf5 libraries. With h5py, it's trivial to read
slices of hdf5 dataset matrices.

data = HDF5ReadRows[filename, dataset, start, end];

The corresponding code snippet in my mathlink function is:

h5 = h5py.File(filename, 'r')
data = h5[dataset][start:end][:]

in case anyone else has to do this.


  • Prev by Date: Re: maximum of a series
  • Next by Date: Exporting .eps file, screen doesn't match .eps file
  • Previous by thread: Re: Loading portion of large HDF5 array?
  • Next by thread: Re: Change in Times[] behavior from Mathematica 7 to Mathematica