Bad imports of data files -- extra empty lists showing up?
- To: mathgroup at smc.vnet.net
- Subject: [mg80571] Bad imports of data files -- extra empty lists showing up?
- From: Curtis Osterhoudt <cfo at lanl.gov>
- Date: Sun, 26 Aug 2007 03:08:27 -0400 (EDT)
- Organization: LANL
- Reply-to: cfo at lanl.gov
Hi, all, I noticed this problem the other day, on different data sets, not thinking much of it. Then, when it cropped up again, I started to get worried. I'm really not sure how to think about it, and so am requesting some advice from the experts. I have tried rewriting this message a few times, and can't figure out how to state the problem very clearly, so please bear with me. I know that attachments aren't allowed, but my problem is that if I copy-and-paste the troublesome dataset into this message (I've tried), whatever formatting is causing the problem is lost. For example, I'll paste the data into this message, then copy it from the message to a text file, then save that and import it into Mathematica. The problem disappears. So if anyone is curious, perhaps they can email me directly and I can send some sample "bad" datasets. The data was taken using a VB program on a windows machine, and this version of Mathematica is running on a linux machine. However, 1) the problem crops up in perhaps 10 - 25% of the files so far, ALL of which were originally produced on a windows machine; 2) the problem does not occur in the same place in each file, IF it occurs at all; 3) if I re-do the import, and the file imports incorrectly, the problems occur at the same places in the file; 4) if I remove portions of the file (using a text editor, perhaps), the problems may occur in different spots, or the problems may disappear. What I've tried: Import the data sets using Import["file name", "Table"]. Typically the datasets have ".txt" or ".dat" extensions. Some files consist of number triplets; some of doublets; they're all TAB-separated. Expected behavior: the data is imported correctly; files with n lines of m numbers per line should show up as tables consisting of n length-m lists. This is what happens most of the time. Actual behavior: A given file will import correctly, but with occasional empty lists interspersed in among the data points. For example, a 2*10^5 length dataset has empty lists ( {} ) at seven different places in it. A 10^4 length dataset has only one empty list. So far, I've just been importing the datasets, searching for lines which do not contain the expected doublets or triplets, and just deleting those lines. But that's obviously extra work (even if Mathematica does it for me). I've been able to cut some of these example data files down a bit, and still retain the "bad" behavior. If anyone can shed some light on this for me, I'd much appreciate it! -- ========================================================== Curtis Osterhoudt cfo at remove_this.lanl.and_this.gov PGP Key ID: 0x4DCA2A10 Please avoid sending me Word or PowerPoint attachments See http://www.gnu.org/philosophy/no-word-attachments.html ==========================================================