MathGroup Archive: June 2011 [00520]

[Date Index] [Thread Index] [Author Index]

Re: parsing a strig

To: mathgroup at smc.vnet.net
Subject: [mg119849] Re: parsing a strig
From: Bill Rowe <readnews at sbcglobal.net>
Date: Sat, 25 Jun 2011 05:29:04 -0400 (EDT)

On 6/24/11 at 7:47 AM, tsariysk at craft-tech.com (Ted Sariyski) wrote:

>Hi, I import a text file with Import[filename,"Table"]. The file has
>a header followed by data. The header contains predefined keywords
>like TITLE, ZONE, VARIABLES, etc.  A VARIABLES line, shown below, is
>a list of 'varname,varunits' pairs, which I need to extract as
>pairs.

>{{VARIABLES,=,'um','I_p,(W/sr/um)','I_a,(W/sr/um)','I_ae,(W/sr/um)',
>'m','I_p,(W/sr/m)','I_a,(W/sr/m)','I_ae,(W/ sr/m)'}}

>I tried StringSplit[varList,"'"] but got Out[]:
>{{um,,,I_p},{(W/sr/um),,,I_a},{(W/sr/um),,,I_ae},...}}, which is
>wrong.

It looks like the line starting {{VARIABLES you posted is the
result of the Import function, not the actual text in the file
you want to parse. So, I copy and paste from your post and use
Import to create the same list I think you got when doing
Import, i.e.

In[6]:= vars =
  Import[StringToStream[
    "VARIABLES,=,'um','I_p,(W/sr/um)','I_a,(W/sr/um)','I_ae,(W/sr/um)',\
'm','I_p,(W/sr/m)','I_a,(W/sr/m)','I_ae,(W/sr/m)'"], "CSV"]

Out[6]= {{"VARIABLES", "=", "'um'", "'I_p", "(W/sr/um)'", "'I_a",
   "(W/sr/um)'",
      "'I_ae", "(W/sr/um)'", "'m'", "'I_p", "(W/sr/m)'", "'I_a",
   "(W/sr/m)'",
      "'I_ae", "(W/sr/m)'"}}

Assuming I have correctly understood you post, then StringSplit
isn't the right tool since there is not a single string. This
will work

In[7]:= StringReplace[vars[[1, 3 ;;]], "'" -> ""]

Out[7]= {um,I_p,(W/sr/um),I_a,(W/sr/um),I_ae,(W/sr/um),m,I_p,(W/sr/m),I_a=
,(W/sr/m),I_ae,(W/sr/m)}

But if what you posted was a single string then this does the trick

In[9]:= StringSplit[
   StringReplace[
    "VARIABLES,=,'um','I_p,(W/sr/um)','I_a,(W/sr/um)','I_ae,(W/sr/um)',\
'm','I_p,(W/sr/m)','I_a,(W/sr/m)','I_ae,(W/sr/m)'", "'" -> ""],
   ","][[3 ;;]]

Out[9]= {um,I_p,(W/sr/um),I_a,(W/sr/um),I_ae,(W/sr/um),m,I_p,(W/sr/m),I_a=
,(W/sr/m),I_ae,(W/sr/m)}

But I do wonder if either of these are truly correct. Except for
the first um, it looks like there is a variable followed by
units for that variable, i.e., I_p appears to be a variable in
units of watts/stereradian/micrometer. If this interpretation is
correct, the first um is most likely interpreted as units of
micrometers and the actual variable name is missing.

Quite frankly, if I were reading in data from a text file that
was tagged with all of the data on one line following the tag, I
would not use Import. Instead I would use FindList and then
parse the strings returned by FindList. This will execute faster
since unlike Import, FindList makes no attempt to
interpret/parse what is being read in.

I would also recommend using StringReplace to delete the
underscore character. That has built in meaning in Mathematica
and will cause problems if you were to use ToExpression to
change the strings to symbols.

Follow-Ups:
- Re: Improt vs Get
  - From: Peter Breitfeld <phbrf@t-online.de>
- Re: Improt vs Get
  - From: "Scot T. Martin" <smartin@seas.harvard.edu>
- Re: Improt vs Get
  - From: DrMajorBob <btreat1@austin.rr.com>
- BinaryRead question
  - From: Ted Sariyski <tsariysk@craft-tech.com>

Prev by Date: Re: Multiple use of Set on lists

Next by Date: Re: Assigning part of indexed object

Previous by thread: HDF5

Next by thread: BinaryRead question