MathGroup Archive 2013

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Importing a file and extracting data

  • To: mathgroup at smc.vnet.net
  • Subject: [mg131162] Re: Importing a file and extracting data
  • From: David Bailey <dave at removedbailey.co.uk>
  • Date: Sat, 15 Jun 2013 04:20:48 -0400 (EDT)
  • Delivered-to: l-mathgroup@mail-archive0.wolfram.com
  • Delivered-to: l-mathgroup@wolfram.com
  • Delivered-to: mathgroup-outx@smc.vnet.net
  • Delivered-to: mathgroup-newsendx@smc.vnet.net
  • References: <kpeln9$htb$1@smc.vnet.net>

On 14/06/2013 09:54, howardfink at gmail.com wrote:
> I have a series of files of this form:
>
> June 7, 2013
> Tc+Naphthalene  vs Temperature (oC)
>
> Run 1
> Tc_Naph_84_2C			3.740 ns
> Tc_Naph_87_1C			3.731 ns
> Tc_Naph_89_9C			3.720 ns
> Tc_Naph_92_9C			3.704 ns
> Tc_Naph_94_7C			3.687 ns
> Tc_Naph_97_6C			3.694 ns
>
>
> Run 2
> Tc_Naph_83_2C			3.758 ns
> Tc_Naph_83_4C			3.750 ns
> Tc_Naph_86_4C			3.728 ns
> Tc_Naph_88_1C			3.725 ns
> Tc_Naph_90_2C			3.716 ns
> Tc_Naph_93_1C			3.704 ns
> Tc_Naph_94_7C			3.673 ns
> Tc_Naph_97_7C			3.684 ns
> Tc_Naph_97_9C			3.665 ns
>
>
>
>
> I used an Import command to read in the file, but now I am just sitting and=
>   staring, without a clue how to get the 84_2 converted to the number 84.2,e=
> tc. and ending up with two lists: Run 1 and Run 2, consisting of pairs of t=
> emperature and time.  The temperature will eventually be converted to 1/abs=
> olute temperature.
>
> I've read lots and lots of help, thumbed through dozens of pages of a Mathe=
> matica 5 manual, and don't know where to start.  I'm trying to help a 90-ye=
> ar-old chemistry professor, who is currently using a calculator, but  there=
>   will be dozens of runs of this experiment.
>
Import is designed to read text or binary data formatted in a standard 
form - e.g. CSV. Clearly your files have an ad-hoc format, so you can't 
expect Mathematica (or anything else) to read them without some effort!

The job is complicated by the fact that you (or the prof) used "_" 
rather than a decimal point, and that the number is joined on to other 
textual data. I am going to assume that the units (C and ns) are the 
same throughout, and can be discarded, and that all the temperatures 
have a decimal part, even if it is 0.

First define a couple of functions:

dataConvert[{{run_}, samples_}] := {run, Map[process, samples]};

process[line_] := Module[{tmp},
   tmp = StringReplace[
     "\"" <> line, {"_" ~~ a : (DigitCharacter ..) ~~ "_" ~~
        b : (DigitCharacter ..) ~~ "C" :>
       "\"," <> a <> "." <> b <> ",", " ns" :> ""}];
   ToExpression["{" <> tmp <> "}"]
   ];

It is best to avoid Import if the data is not in a recognised format, 
and to read it in as strings, discarding the empty lines, and extracting 
the data in the first two lines:

In[4]:= data = ReadList["c:\\maths\\data.dat", String];

In[7]:= data = DeleteCases[data, {}];

In[5]:= fileDate = data[[1]]

Out[5]= "June 7, 2013      "

In[9]:= fileTitle = data[[2]]

Out[9]= "Tc+Naphthalene  vs Temperature (oC)"

Break up the rest by detecting the 'Run' lines:
In[29]:= tmp =
  Partition[SplitBy[data[[3 ;;]], StringMatchQ[#, "Run" ~~ ___] &], 2]

Out[29]= {{{"Run 1"}, {"Tc_Naph_84_2C			3.740 ns",
    "Tc_Naph_87_1C			3.731 ns", "Tc_Naph_89_9C			3.720 ns",
    "Tc_Naph_92_9C			3.704 ns", "Tc_Naph_94_7C			3.687 ns",
    "Tc_Naph_97_6C			3.694 ns"}}, {{"Run 2"}, {"Tc_Naph_83_2C			3.758 \
ns", "Tc_Naph_83_4C			3.750 ns", "Tc_Naph_86_4C			3.728 ns",
    "Tc_Naph_88_1C			3.725 ns", "Tc_Naph_90_2C			3.716 ns",
    "Tc_Naph_93_1C			3.704 ns", "Tc_Naph_94_7C			3.673 ns",
    "Tc_Naph_97_7C			3.684 ns", "Tc_Naph_97_9C			3.665 ns"}}}

Now apply the previous functions to produce a nested list structure of 
strings and real numbers:

In[38]:= Map[dataConvert, tmp]

Out[38]= {{"Run 1", {{"Tc_Naph", 84.2, 3.74}, {"Tc_Naph", 87.1,
     3.731}, {"Tc_Naph", 89.9, 3.72}, {"Tc_Naph", 92.9,
     3.704}, {"Tc_Naph", 94.7, 3.687}, {"Tc_Naph", 97.6,
     3.694}}}, {"Run 2", {{"Tc_Naph", 83.2, 3.758}, {"Tc_Naph", 83.4,
     3.75}, {"Tc_Naph", 86.4, 3.728}, {"Tc_Naph", 88.1,
     3.725}, {"Tc_Naph", 90.2, 3.716}, {"Tc_Naph", 93.1,
     3.704}, {"Tc_Naph", 94.7, 3.673}, {"Tc_Naph", 97.7,
     3.684}, {"Tc_Naph", 97.9, 3.665}}}}

Clearly it is best in future to record data in an easier format - 
whatever language you use to process it!

You may want to lookup StringReplace and StringExpression to understand 
the above, and help you with other problems of this  type.

David Bailey
http://www.dbaileyconsultancy.co.uk




  • Prev by Date: Re: Importing a file and extracting data
  • Next by Date: Re: Importing a file and extracting data
  • Previous by thread: Re: Importing a file and extracting data
  • Next by thread: Summer School on Bio-Inspired Computing using Mathematica, Shenyang,