MathGroup Archive: July 2000 [00264]

[Date Index] [Thread Index] [Author Index]

ReadList: Is it really slow?

To: mathgroup at smc.vnet.net
Subject: [mg24450] ReadList: Is it really slow?
From: "anonymous" <guest at anonymous.com>
Date: Tue, 18 Jul 2000 00:58:52 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com

I found many entries in this group saying: ReadList is slow.  I found that
ReadList isn't slow in its own right.  ReadList has fine throughput -
however: ReadList is a resource monger.

I tried the test (test below) on a 1MB file.  The 1MB was read repetitively.
The idea was to test the statements in this news channel that ReadList has a
slow throughput.
1st test
    50MB read as type Record; comma separated mixed data types
    5,631,350 objects (totals far more than 50MB) created and stored under
symbols y[1-50]
    Time taken:
        {156.37 Second, Null}
    (AMD K62-350Mhz, 64MB ram}
    * minimal swapping occurred
2nd test (not seen below)
    10MB read (as type Byte)
    10,000,000 objects created and stored under symbols y[1-10]
    {102.16 Second, Null}
    * swapping occurred after a few megabytes read

* Machine: AMD K62-350Mhz, 64MB ram
* File in the 1st test was a 1MB CommaSeparatedValues file containing
numbers and strings having 17 fields.
* File in the 2nd test was a 1MB random data file.

However, when trying to read a 10MB file for either test, so much
swapping occurred that I didn't let the test finish (after several minutes).

So the moral of the story is: don't read the whole file at once.  Open the
file.  Read chunks of it at a time.  Stick to what doesn't begin swapping.
As well (as seen in the data above) think about how many objects your asking
Mathematica to create.

Conclusion:  Using ReadList to interpret one line at a time out of a large
"comma separated" file is far faster than ReadList against the whole file.
It would be near linear to expectations.

Below: this is test1.  It reads a 1MB "comma separated" file 50 times.
NOTE: it stores all of 50MB in the Mathematica kernel.

Timing[ 
  Do[
    Do[
         x = 
          OpenRead[
            "d:\\usr\\john\\math\\Mathematica\\Hours\\tmp\\Copy (" <> 
              FromCharacterCode[i + 48 + 1] <> ") of test1", 
            DOSTextFormat -> False];
         k = (j - 1)*5 + i;
         y[k] =

          ReadList[x, Record, RecordLists -> True, NullRecords -> True,
            RecordSeparators -> {","}];
         Close[x],
        {i, 1, 5}
        ]
      Print[j*5],
    {j, 1, 10}
    ]
  ]

Prev by Date: Re: Re: With[{software=Mathematica}, Frustration]

Next by Date: Re: Coloring multiple graphs differently

Previous by thread: 3d-2d points

Next by thread: 12-bit Tiffs importable?