Re: Importing text files
- To: mathgroup at smc.vnet.net
- Subject: [mg86913] Re: Importing text files
- From: michael.p.croucher at googlemail.com
- Date: Wed, 26 Mar 2008 04:52:22 -0500 (EST)
- References: <fsa5h5$aef$1@smc.vnet.net>
On 25 Mar, 06:19, "Coleman, Mark" <Mark.Cole... at LibertyMutual.com>
wrote:
> Greetings,
>
> I'm using Mathematica v6.02 to import a large comma delimited text file (CSV
> format). The file is about 700,000 records and takes 241 Mb of space
> according to ByteCount. The file is a mix of real and string characters.
> I've imported much smaller versions of this using the built-in support
> for Excel XLS format without any difficulties.
>
> For the larger file, the Import appears to work fine except that all of
> the string elements import with quotation marks, i.e., if you look at
> the full form, all of the string elements are expressed as
>
> {"\"YES\"",.....}
>
> With the \ character intact. This is the first I've bumped into this
> particular issue using Import.
>
> Two questions: First, is there a way that I can remove the "\"
> characters as part of the Import command? And second, if these
> characters cannot be removed during the Import process, can someone
> offer an efficient way to remove them from the imported list?
>
> Thanks,
>
> -Mark
Hi
The behavior of Import has changed from 6.0.1 to 6.0.2 regarding csv
files. In old versions, a string that is surrounded in quotes such as
"hello" was imported as "hello" but now it is imported as "\"hello
\"".
Say I have a csv file called Book1.csv with the following data
"hello",1
test,2
bode,3
hehehe,4
"oidjaojf",5
cfoiuhsrfipvuh,6
fojewhnfrvo,7
"werijfbwpiufv",8
nhjawhb,9
"cvijweqbv",10
vwjiebv,11
In[2]:= book = Import["Book1.csv"]
Out[2]= {{"\"hello\"", 1}, {"test", 2}, {"bode", 3}, {"hehehe",
4}, {"\"oidjaojf\"", 5}, {"cfoiuhsrfipvuh", 6}, {"fojewhnfrvo",
7}, {"\"werijfbwpiufv\"", 8}, {"nhjawhb", 9}, {"\"cvijweqbv\"",
10}, {"vwjiebv", 11}}
If you don't want the escaped quotes (\") you can strip them out as
follows:
In[3]:=f = If[StringQ[#], StringReplace[#, "\"" -> ""], #] &;
In[4]:= Map[f, book, {2}]
Out[4]= {{"hello", 1}, {"test", 2}, {"bode", 3}, {"hehehe",
4}, {"oidjaojf", 5}, {"cfoiuhsrfipvuh", 6}, {"fojewhnfrvo",
7}, {"werijfbwpiufv", 8}, {"nhjawhb", 9}, {"cvijweqbv",
10}, {"vwjiebv", 11}}
I don't know if this is the most efficient way but it does the job.
Hope it helps,
Mike