Re: Importing tab-delimited data files?
- To: mathgroup at smc.vnet.net
- Subject: [mg62688] Re: [mg62604] Importing tab-delimited data files?
- From: "Dale R. Horton" <daleh at wolfram.com>
- Date: Wed, 30 Nov 2005 22:09:02 -0500 (EST)
- References: <200511290945.EAA08728@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
On Nov 29, 2005, at 3:45 AM, AES wrote:
> I create a text file "filedata" with the following content using a
> text
> editor, with tabs between each number or string (5 tabs per line), and
> no content -- not even a space, just successive adjacent tabs -- in
> the
> empty slots.
>
> (The columns should line up if your reader uses monospaced type.)
>
> 11 aaa 22 bbb 33 ccc
> 22 bbb 33 ccc
> 33 ccc
>
> Opening Mathematica and using !!filedata reproduces exactly same
> thing:
>
> 11 aaa 22 bbb 33 ccc
> 22 bbb 33 ccc
> 33 ccc
>
> Trying to follow this with
>
> fileDataAsViewed = !!fildata
>
> or
>
> fileDataAsViewed = %
>
> doesn't work, however.
!!file doesn't produce output, but prints the file contents as a side-
effect. This is like doing
y = Print[x^2]
> Using the Mathematica expression
>
> Import["datafile", "Table",
> ConversionOptions->{"TableSeparators"->{{"\r","\n"},{"\t"}}}]
>
> gives:
>
> 11 aaa 22 bbb 33 ccc
> 22 bbb 33 ccc
> 33 ccc
The Table format assumes that multiple consecutive separators are a
single separator. That way if you use spaces to create lined up
columns you don't end up with a bunch of empty fields.
> Recreating the text file with a space between the tabs in the empty
> slots and applying the same Import[ ] expression, however, gives the
> "right" answer:
>
> 11 aaa 22 bbb 33 ccc
> 22 bbb 33 ccc
> 33 ccc
This is because you now have a non-separator (the spaces) betweenn
each of the separators (the tabs).
> I suppose this is not exactly unexpected. The problem is, the app
> that
> creates the (much larger) tab-delimited filedata text file I really
> want
> to load into a Mathematica Table creates numerous blank cells, i.e.
> adjacent and unspaced tabs. I guess I'll just have to go at it with a
> smart text editor and separate adjacent tabs before trying to load it.
The solution is to use the TSV (tab-separated-value) format, not the
Table format. TSV treats each tab as new column.
Import["datafile", "TSV"]
-Dale