Re: Loading portion of large HDF5 array?
- To: mathgroup at smc.vnet.net
- Subject: [mg114126] Re: Loading portion of large HDF5 array?
- From: Paul <pnorthug at gmail.com>
- Date: Wed, 24 Nov 2010 07:00:16 -0500 (EST)
- References: <icdoch$6i8$1@smc.vnet.net> <icg6pq$97i$1@smc.vnet.net>
On Nov 23, 2:59 am, Paul <pnort... at gmail.com> wrote: > On Nov 22, 4:40 am, Bill Rowe <readn... at sbcglobal.net> wrote: > > > On 11/20/10 at 6:27 PM, pnort... at gmail.com (Paul) wrote: > > > >I have a large matrix (>10gb) in an HDF5 file. > > >Is there a way to read only a portion of this matrix using Import[] > > >and the HDF5 import format? > > > Yes. You can read various portions of the file. See > > > ref/format/HDF5 > > > in the DocumentCenter for details > > A specific example is below (snipped output from h5ls -vlr) with > matrix '/data' with dimensions ~ {10^9, 51}. How would I read in the > first 1000 rows, the next 1000? Thanks for the documentation pointer > but I didn't find any way to do this. I understand you can load in > datasets separately but maybe not a portion of a single dataset. > > Import["file.h5', {"Datasets", "/data"}] attempts to load the full > matrix. > > /data Dataset {110945492/Inf, 51/5= 1} > Location: 1:800 > Links: 1 > Chunks: {1000, 51} 204000 bytes > Storage: 1158043888 logical bytes, 3840201966 allocated bytes= , > 107.67% utilization > Filter-0: deflate-1 OPT {4} > Type: IEEE 32-bit little-endian float To do this, I wrote a cython mathlink function that calls h5py, a python wrapper of the hdf5 libraries. With h5py, it's trivial to read slices of hdf5 dataset matrices. data = HDF5ReadRows[filename, dataset, start, end]; The corresponding code snippet in my mathlink function is: h5 = h5py.File(filename, 'r') data = h5[dataset][start:end][:] in case anyone else has to do this.