Re: help! to input data...
- To: mathgroup at smc.vnet.net
- Subject: [mg4211] Re: help! to input data...
- From: ianc (Ian Collier)
- Date: Tue, 18 Jun 1996 03:25:06 -0400
- Organization: Wolfram Research, Inc.
- Sender: owner-wri-mathgroup at wolfram.com
In article <4pit53$jk5 at dragonfly.wolfram.com>, tcdoe+ at pitt.edu (todd c. doehring) wrote: > well, silly me. > I thought i was pretty good at the basics of mathematica, but today i > tried to input some numerical data that was comma delimited. i.e. > > 456,-45,21,0,5 > 43,25,3,66,65 > ... > ... > ... > > it is a large data file (10Mb+) with 5 numbers per line. i can't > convert the commas to tabs easily. > i tried simply using: > > data=ReadList["c:\micro.dat", Number, RecordLists->True]; > > then: > > data=ReadList["c:\micro.dat", RecordLists->True, > WordSeparators->{","}]; > > and > > data=ReadList["c:\micro.dat", Number, RecordLists->True, > TokenWords->","]; > > but all i ever get is the first number and an error such as: > > Read::readn: Syntax error reading a real number from c:\micro.dat. > > would someone please tell me the best way to input this data file?? > it's driving me crazy that i can't figure out something so basic. > > thanks, > todd The following is taken from the Technical Support FAQ area of Wolfram Research's web pages (http://www.wolfram.com/support/). How do I read comma-delimited data into Mathematica? Mathematica's ReadList command does not have any built-in options to read comma-separated data. However, we can still use ReadList in combination with other functions to do this. Here is an example data file. In[1]:= !!data.txt 1,10,1000 2,20,2000 3,30,3000 4,40,4000 5,50,5000 6,60,6000 7,70,7000 8,80,8000 9,90,9000 10,100,10000 There are two methods for reading comma-separated data. The first method involves reading each value as a number and each comma (or newline character) as a string. You then discard the strings, and are left with only the values. We start with the ReadList command. In[2]:= data = ReadList["data.txt",{Number,Character}] Out[2]= {{1, ,}, {10, ,}, {1000, }, {2, ,}, {20, ,}, {2000, }, {3, ,}, > {30, ,}, {3000, }, {4, ,}, {40, ,}, {4000, }, {5, ,}, {50, ,}, {5000, }, > {6, ,}, {60, ,}, {6000, }, {7, ,}, {70, ,}, {7000, }, {8, ,}, {80, ,}, > {8000, }, {9, ,}, {90, ,}, {9000, }, {10, ,}, {100, ,}, {10000, }} Notice that each number is paired up with a character following it (either a comma or a newline character). We can use the InputForm command to verify this. In[3]:= InputForm[data] Out[3]//InputForm= {{1, ","}, {10, ","}, {1000, "\n"}, {2, ","}, {20, ","}, {2000, "\n"}, {3, ","}, {30, ","}, {3000, "\n"}, {4, ","}, {40, ","}, {4000, "\n"}, {5, ","}, {50, ","}, {5000, "\n"}, {6, ","}, {60, ","}, {6000, "\n"}, {7, ","}, {70, ","}, {7000, "\n"}, {8, ","}, {80, ","}, {8000, "\n"}, {9, ","}, {90, ","}, {9000, "\n"}, {10, ","}, {100, ","}, {10000, "\n"}} To remove these extra characters, we need to take the first element of every sublist. We can Map the First command to every sublist to extract the values. In[4]:= data = Map[First,data] Out[4]= {1, 10, 1000, 2, 20, 2000, 3, 30, 3000, 4, 40, 4000, 5, 50, 5000, 6, > 60, 6000, 7, 70, 7000, 8, 80, 8000, 9, 90, 9000, 10, 100, 10000} We now have our data, but we need it in a matrix form. Since we know that this data is in three columns, we can use the Partition command to partition the data into equal sublists of length 3. In[5]:= data = Partition[data,3] Out[5]= {{1, 10, 1000}, {2, 20, 2000}, {3, 30, 3000}, {4, 40, 4000}, > {5, 50, 5000}, {6, 60, 6000}, {7, 70, 7000}, {8, 80, 8000}, > {9, 90, 9000}, {10, 100, 10000}} The data is more clearly displayed using MatrixForm. In[6]:= MatrixForm[data,TableSpacing->{0}] Out[6]//MatrixForm= 1 10 1000 2 20 2000 3 30 3000 4 40 4000 5 50 5000 6 60 6000 7 70 7000 8 80 8000 9 90 9000 10 100 10000 The second method involves read each line as a string and then converting the string into an expression. We simply use the ReadList command with the String specification. In[7]:= data = ReadList["data.txt",String] Out[7]= {1,10,1000, 2,20,2000, 3,30,3000, 4,40,4000, 5,50,5000, 6,60,6000, > 7,70,7000, 8,80,8000, 9,90,9000, 10,100,10000} Note that the data looks like it has been read in as numbers. However, we can see they are string by using the InputForm command. In[8]:= InputForm[data] Out[8]//InputForm= {"1,10,1000", "2,20,2000", "3,30,3000", "4,40,4000", "5,50,5000, "6,60,\ 6000", "7,70,7000", "8,80,8000", "9,90,9000", "10,100,10000"} This functions convert a string (of a sequence of numbers) into a list of numbers. In[9]:= f[x_String] := ToExpression[StringJoin["{", x ,"}"]] Now we Map this function to our data In[10]:= data = Map[f,data] Out[10]= {{1, 10, 1000}, {2, 20, 2000}, {3, 30, 3000}, {4, 40, 4000}, > {5, 50, 5000}, {6, 60, 6000}, {7, 70, 7000}, {8, 80, 8000}, > {9, 90, 9000}, {10, 100, 10000}} to get our matrix. This method does not requires you to know how many columns were in your data. However, the first method has a slight speed advantage for large data files. Here are both methods used on a 200k file (from an IBM RS/6000 workstation). In[11]:= Timing[ long1 = Partition[Map[First, ReadList["long.txt",{Number,Character}]],3]; ] Out[11]= {9.02 Second, Null} In[12]:= Timing[ long2 = Map[ToExpression[StringJoin["{",#,"}"]]& , ReadList["long.txt",String]]; ] Out[12]= {14.7 Second, Null} In[13]:= long1 == long2 Out[13]= True The specific URL for this question and answer is: http://www.wolfram.com/support/InputOutput/ExternalFiles/CommaSeparatedData.html I hope this helps. --Ian ----------------------------------------------------------- Ian Collier Wolfram Research, Inc. ----------------------------------------------------------- tel:(217) 398-0700 fax:(217) 398-0747 ianc at wolfram.com Wolfram Research Home Page: http://www.wolfram.com/ ----------------------------------------------------------- ==== [MESSAGE SEPARATOR] ====