MathGroup Archive 1996

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: help! to input data...

  • To: mathgroup at smc.vnet.net
  • Subject: [mg4211] Re: help! to input data...
  • From: ianc (Ian Collier)
  • Date: Tue, 18 Jun 1996 03:25:06 -0400
  • Organization: Wolfram Research, Inc.
  • Sender: owner-wri-mathgroup at wolfram.com

In article <4pit53$jk5 at dragonfly.wolfram.com>, tcdoe+ at pitt.edu (todd c.
doehring) wrote:

> well, silly me.
> I thought i was pretty good at the basics of mathematica, but today i 
> tried to input some numerical data that was comma delimited. i.e.
> 
> 456,-45,21,0,5
> 43,25,3,66,65
> ...
> ...
> ...
> 
> it is a large data file (10Mb+) with 5 numbers per line. i can't
> convert the commas to tabs easily.
>  i tried simply using:
> 
> data=ReadList["c:\micro.dat", Number, RecordLists->True];
> 
> then:
> 
> data=ReadList["c:\micro.dat", RecordLists->True,
> WordSeparators->{","}];
> 
> and
> 
> data=ReadList["c:\micro.dat", Number, RecordLists->True,
> TokenWords->","];
> 
> but all i ever get is the first number and an error such as:
> 
> Read::readn: Syntax error reading a real number from c:\micro.dat.
> 
> would someone please tell me the best way to input this data file??
> it's driving me crazy that i can't figure out something so basic.
> 
> thanks,
> todd

The following is taken from the Technical Support FAQ area of 
Wolfram Research's web pages (http://www.wolfram.com/support/).



How do I read comma-delimited data into Mathematica?


Mathematica's ReadList command does not have any built-in options to
read comma-separated data. However, we can still use ReadList in
combination with other functions to do this. Here is an example data file. 

In[1]:= !!data.txt

1,10,1000
2,20,2000
3,30,3000
4,40,4000
5,50,5000
6,60,6000
7,70,7000
8,80,8000
9,90,9000
10,100,10000

There are two methods for reading comma-separated data. The first
method involves reading each value as a number and each comma (or
newline character) as a string. You then discard the strings, and are left
with only the values. 

We start with the ReadList command. 

In[2]:= data = ReadList["data.txt",{Number,Character}]

Out[2]= {{1, ,}, {10, ,}, {1000, }, {2, ,}, {20, ,}, {2000, }, {3, ,}, 
 
>    {30, ,}, {3000, }, {4, ,}, {40, ,}, {4000, }, {5, ,}, {50, ,}, {5000, }, 
 
>    {6, ,}, {60, ,}, {6000, }, {7, ,}, {70, ,}, {7000, }, {8, ,}, {80, ,}, 
 
>    {8000, }, {9, ,}, {90, ,}, {9000, }, {10, ,}, {100, ,}, {10000, }}

Notice that each number is paired up with a character following it (either a
comma or a newline character). We can use the InputForm command to
verify this. 

In[3]:= InputForm[data]

Out[3]//InputForm= 
  {{1, ","}, {10, ","}, {1000, "\n"}, {2, ","}, {20, ","}, {2000, "\n"}, 
   {3, ","}, {30, ","}, {3000, "\n"}, {4, ","}, {40, ","}, {4000, "\n"}, 
   {5, ","}, {50, ","}, {5000, "\n"}, {6, ","}, {60, ","}, {6000, "\n"}, 
   {7, ","}, {70, ","}, {7000, "\n"}, {8, ","}, {80, ","}, {8000, "\n"}, 
   {9, ","}, {90, ","}, {9000, "\n"}, {10, ","}, {100, ","}, {10000, "\n"}}

To remove these extra characters, we need to take the first element of
every sublist. We can Map the First command to every sublist to extract the
values. 

In[4]:= data = Map[First,data]

Out[4]= {1, 10, 1000, 2, 20, 2000, 3, 30, 3000, 4, 40, 4000, 5, 50, 5000, 6, 
 
>    60, 6000, 7, 70, 7000, 8, 80, 8000, 9, 90, 9000, 10, 100, 10000}

We now have our data, but we need it in a matrix form. Since we know that
this data is in three columns, we can use the Partition command to partition
the data into equal sublists of length 3. 

In[5]:= data = Partition[data,3]

Out[5]= {{1, 10, 1000}, {2, 20, 2000}, {3, 30, 3000}, {4, 40, 4000}, 
 
>    {5, 50, 5000}, {6, 60, 6000}, {7, 70, 7000}, {8, 80, 8000}, 
 
>    {9, 90, 9000}, {10, 100, 10000}}

The data is more clearly displayed using MatrixForm. 

In[6]:= MatrixForm[data,TableSpacing->{0}]

Out[6]//MatrixForm= 1     10    1000
                    2     20    2000
                    3     30    3000
                    4     40    4000
                    5     50    5000
                    6     60    6000
                    7     70    7000
                    8     80    8000
                    9     90    9000
                    10    100   10000

The second method involves read each line as a string and then converting
the string into an expression. 

We simply use the ReadList command with the String specification. 

In[7]:= data = ReadList["data.txt",String] 

Out[7]= {1,10,1000, 2,20,2000, 3,30,3000, 4,40,4000, 5,50,5000, 6,60,6000, 
 
>    7,70,7000, 8,80,8000, 9,90,9000, 10,100,10000}

Note that the data looks like it has been read in as numbers. However, we
can see they are string by using the InputForm command. 

In[8]:= InputForm[data]

Out[8]//InputForm= 
  {"1,10,1000", "2,20,2000", "3,30,3000", "4,40,4000", "5,50,5000, "6,60,\
    6000", "7,70,7000", "8,80,8000", "9,90,9000", "10,100,10000"}

This functions convert a string (of a sequence of numbers) into a list of
numbers. 

In[9]:= f[x_String] := ToExpression[StringJoin["{", x ,"}"]]

Now we Map this function to our data 

In[10]:= data = Map[f,data]

Out[10]= {{1, 10, 1000}, {2, 20, 2000}, {3, 30, 3000}, {4, 40, 4000}, 
 
>    {5, 50, 5000}, {6, 60, 6000}, {7, 70, 7000}, {8, 80, 8000}, 
 
>    {9, 90, 9000}, {10, 100, 10000}}

to get our matrix. 

This method does not requires you to know how many columns were in
your data. However, the first method has a slight speed advantage for
large data files. Here are both methods used on a 200k file (from an IBM
RS/6000 workstation). 

In[11]:= Timing[ long1 = Partition[Map[First,
        ReadList["long.txt",{Number,Character}]],3]; ]

Out[11]= {9.02 Second, Null}

In[12]:= Timing[ long2 = Map[ToExpression[StringJoin["{",#,"}"]]& ,
         ReadList["long.txt",String]]; ]

Out[12]= {14.7 Second, Null}

In[13]:= long1 == long2

Out[13]= True



The specific URL for this question and answer is:

http://www.wolfram.com/support/InputOutput/ExternalFiles/CommaSeparatedData.html

I hope this helps.

--Ian

-----------------------------------------------------------
Ian Collier
Wolfram Research, Inc.
-----------------------------------------------------------
tel:(217) 398-0700   fax:(217) 398-0747    ianc at wolfram.com
Wolfram Research Home Page:         http://www.wolfram.com/
-----------------------------------------------------------

==== [MESSAGE SEPARATOR] ====


  • Prev by Date: psfix on Mac???
  • Next by Date: MATLAB user
  • Previous by thread: Re: help! to input data...
  • Next by thread: Re: help! to input data...