MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Working with huge text files with mathematica

  • To: mathgroup at smc.vnet.net
  • Subject: [mg56596] Re: Working with huge text files with mathematica
  • From: Maxim <ab_def at prontomail.com>
  • Date: Fri, 29 Apr 2005 03:22:11 -0400 (EDT)
  • References: <d4clv2$357$1@smc.vnet.net> <d4mrtu$1uq$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

On Wed, 27 Apr 2005 02:03:42 +0000 (UTC), David Bailey  
<dave at Remove_Thisdbailey.co.uk> wrote:

> Andrey Shevchuk wrote:
>> Hi Everybody,
>>
>> I face a problem when try to read in a huge data file with my  
>> Mathematica5.1
>>
>> The file is approx. 4Gb large and was created by another Mathematica
>> application (So, I think Mathematica should be able to handle it).
>>
>> Now, if I try to use Read on a stream from this file and it returns
>> EndOfFile and nothing else!
>> I checked the file with the HexEdit (an editor for huge files) and it  
>> is not
>> corrupt and has actually the data I need.
>>
>> A similar file (from the same application) but 1Gb works perfectly.
>> Any ideas?
>>
>> Does Mathematica have an internal limit for the file size (and if yes,  
>> can
>> one somehow override this option) ?
>>
>> I would appreciate any feedback!
>>
>>
>>
> It would seem from what others have written that there is a 2G limit on
> the size of files that can be read by Mathematica under Windows. This
> must be a bug in Mathematica - Windows itself can exceed this limit.
> Rather than changing operating systems or splitting the file, you might
> want to try using J/Link and reading the file via Java. For best
> performance, you might find it was better to read the file in
> substantial chunks and buffer it inside Mathematica, however.
>
> David Bailey
> dbaileyconsultancy.co.uk
>

Here's my attempt to do it with J/Link. Creating a test file:

sz = 10^6;
chunks = Ceiling[2^31 / sz];
L = Table[Random[Integer, {0, 255}], {sz}];
f = OpenWrite["/math/2gb", BinaryFormat -> True];
Do[BinaryWrite[f, L], {chunks}]
Close[f];

And now reading it back (note that the read buffer has to be created on  
the Java side):

<<jlink`
InstallJava[];
f = JavaNew["java.io.FileInputStream", "/math/2gb"];
L2 = JavaNew["[B", sz];
goodF = And@@ Table[
   f@read[L2] == sz &&
     Mod[JavaObjectToExpression[L2], 256] == L,
   {chunks}];
f@close[];
ReleaseJavaObject[L2];
goodF

This should return True, but there is a problem: during the evaluation of  
Table memory usage remains constant (as one would expect) until we have  
read approximately half of the file, after which memory usage begins to  
grow almost linearly, reaching 2Gb and crashing the Mathematica kernel.

So we can check that the first parts of the file are being read correctly,  
but it's still impossible to process the whole file in one go (without  
skipping some parts using f@skip or RandomAccessFile).

Testing the parts of the code separately, we can see that  
JavaObjectToExpression is the culprit:

L2 = JavaNew["[B", sz];
Do[JavaObjectToExpression[L2], {chunks}]

causes the memory blow-up.

Maxim Rytin
m.r at inbox.ru


  • Prev by Date: Re: Re: Re: odd mathematica blindspot
  • Next by Date: Re: AW: Converting a mapping into a well-defined function
  • Previous by thread: Re: Working with huge text files with mathematica
  • Next by thread: Re: Working with huge text files with mathematica