Re: Importing text files
- To: mathgroup at smc.vnet.net
- Subject: [mg86913] Re: Importing text files
- From: michael.p.croucher at googlemail.com
- Date: Wed, 26 Mar 2008 04:52:22 -0500 (EST)
- References: <fsa5h5$aef$1@smc.vnet.net>
On 25 Mar, 06:19, "Coleman, Mark" <Mark.Cole... at LibertyMutual.com> wrote: > Greetings, > > I'm using Mathematica v6.02 to import a large comma delimited text file (CSV > format). The file is about 700,000 records and takes 241 Mb of space > according to ByteCount. The file is a mix of real and string characters. > I've imported much smaller versions of this using the built-in support > for Excel XLS format without any difficulties. > > For the larger file, the Import appears to work fine except that all of > the string elements import with quotation marks, i.e., if you look at > the full form, all of the string elements are expressed as > > {"\"YES\"",.....} > > With the \ character intact. This is the first I've bumped into this > particular issue using Import. > > Two questions: First, is there a way that I can remove the "\" > characters as part of the Import command? And second, if these > characters cannot be removed during the Import process, can someone > offer an efficient way to remove them from the imported list? > > Thanks, > > -Mark Hi The behavior of Import has changed from 6.0.1 to 6.0.2 regarding csv files. In old versions, a string that is surrounded in quotes such as "hello" was imported as "hello" but now it is imported as "\"hello \"". Say I have a csv file called Book1.csv with the following data "hello",1 test,2 bode,3 hehehe,4 "oidjaojf",5 cfoiuhsrfipvuh,6 fojewhnfrvo,7 "werijfbwpiufv",8 nhjawhb,9 "cvijweqbv",10 vwjiebv,11 In[2]:= book = Import["Book1.csv"] Out[2]= {{"\"hello\"", 1}, {"test", 2}, {"bode", 3}, {"hehehe", 4}, {"\"oidjaojf\"", 5}, {"cfoiuhsrfipvuh", 6}, {"fojewhnfrvo", 7}, {"\"werijfbwpiufv\"", 8}, {"nhjawhb", 9}, {"\"cvijweqbv\"", 10}, {"vwjiebv", 11}} If you don't want the escaped quotes (\") you can strip them out as follows: In[3]:=f = If[StringQ[#], StringReplace[#, "\"" -> ""], #] &; In[4]:= Map[f, book, {2}] Out[4]= {{"hello", 1}, {"test", 2}, {"bode", 3}, {"hehehe", 4}, {"oidjaojf", 5}, {"cfoiuhsrfipvuh", 6}, {"fojewhnfrvo", 7}, {"werijfbwpiufv", 8}, {"nhjawhb", 9}, {"cvijweqbv", 10}, {"vwjiebv", 11}} I don't know if this is the most efficient way but it does the job. Hope it helps, Mike