ReadList: Is it really slow?
- To: mathgroup at smc.vnet.net
- Subject: [mg24450] ReadList: Is it really slow?
- From: "anonymous" <guest at anonymous.com>
- Date: Tue, 18 Jul 2000 00:58:52 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
I found many entries in this group saying: ReadList is slow. I found that
ReadList isn't slow in its own right. ReadList has fine throughput -
however: ReadList is a resource monger.
I tried the test (test below) on a 1MB file. The 1MB was read repetitively.
The idea was to test the statements in this news channel that ReadList has a
slow throughput.
1st test
50MB read as type Record; comma separated mixed data types
5,631,350 objects (totals far more than 50MB) created and stored under
symbols y[1-50]
Time taken:
{156.37 Second, Null}
(AMD K62-350Mhz, 64MB ram}
* minimal swapping occurred
2nd test (not seen below)
10MB read (as type Byte)
10,000,000 objects created and stored under symbols y[1-10]
{102.16 Second, Null}
* swapping occurred after a few megabytes read
* Machine: AMD K62-350Mhz, 64MB ram
* File in the 1st test was a 1MB CommaSeparatedValues file containing
numbers and strings having 17 fields.
* File in the 2nd test was a 1MB random data file.
However, when trying to read a 10MB file for either test, so much
swapping occurred that I didn't let the test finish (after several minutes).
So the moral of the story is: don't read the whole file at once. Open the
file. Read chunks of it at a time. Stick to what doesn't begin swapping.
As well (as seen in the data above) think about how many objects your asking
Mathematica to create.
Conclusion: Using ReadList to interpret one line at a time out of a large
"comma separated" file is far faster than ReadList against the whole file.
It would be near linear to expectations.
Below: this is test1. It reads a 1MB "comma separated" file 50 times.
NOTE: it stores all of 50MB in the Mathematica kernel.
Timing[
Do[
Do[
x =
OpenRead[
"d:\\usr\\john\\math\\Mathematica\\Hours\\tmp\\Copy (" <>
FromCharacterCode[i + 48 + 1] <> ") of test1",
DOSTextFormat -> False];
k = (j - 1)*5 + i;
y[k] =
ReadList[x, Record, RecordLists -> True, NullRecords -> True,
RecordSeparators -> {","}];
Close[x],
{i, 1, 5}
]
Print[j*5],
{j, 1, 10}
]
]