ReadList: Is it really slow?
- To: mathgroup at smc.vnet.net
- Subject: [mg24450] ReadList: Is it really slow?
- From: "anonymous" <guest at anonymous.com>
- Date: Tue, 18 Jul 2000 00:58:52 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
I found many entries in this group saying: ReadList is slow. I found that ReadList isn't slow in its own right. ReadList has fine throughput - however: ReadList is a resource monger. I tried the test (test below) on a 1MB file. The 1MB was read repetitively. The idea was to test the statements in this news channel that ReadList has a slow throughput. 1st test 50MB read as type Record; comma separated mixed data types 5,631,350 objects (totals far more than 50MB) created and stored under symbols y[1-50] Time taken: {156.37 Second, Null} (AMD K62-350Mhz, 64MB ram} * minimal swapping occurred 2nd test (not seen below) 10MB read (as type Byte) 10,000,000 objects created and stored under symbols y[1-10] {102.16 Second, Null} * swapping occurred after a few megabytes read * Machine: AMD K62-350Mhz, 64MB ram * File in the 1st test was a 1MB CommaSeparatedValues file containing numbers and strings having 17 fields. * File in the 2nd test was a 1MB random data file. However, when trying to read a 10MB file for either test, so much swapping occurred that I didn't let the test finish (after several minutes). So the moral of the story is: don't read the whole file at once. Open the file. Read chunks of it at a time. Stick to what doesn't begin swapping. As well (as seen in the data above) think about how many objects your asking Mathematica to create. Conclusion: Using ReadList to interpret one line at a time out of a large "comma separated" file is far faster than ReadList against the whole file. It would be near linear to expectations. Below: this is test1. It reads a 1MB "comma separated" file 50 times. NOTE: it stores all of 50MB in the Mathematica kernel. Timing[ Do[ Do[ x = OpenRead[ "d:\\usr\\john\\math\\Mathematica\\Hours\\tmp\\Copy (" <> FromCharacterCode[i + 48 + 1] <> ") of test1", DOSTextFormat -> False]; k = (j - 1)*5 + i; y[k] = ReadList[x, Record, RecordLists -> True, NullRecords -> True, RecordSeparators -> {","}]; Close[x], {i, 1, 5} ] Print[j*5], {j, 1, 10} ] ]