MathGroup Archive: September 2008 [00107]

[Date Index] [Thread Index] [Author Index]

Re: How can I do a "grep -v" equivalent in Import[]?

To: mathgroup at smc.vnet.net
Subject: [mg91775] Re: How can I do a "grep -v" equivalent in Import[]?
From: Bill Rowe <readnews at sbcglobal.net>
Date: Sun, 7 Sep 2008 05:40:28 -0400 (EDT)

On 9/6/08 at 2:07 AM, jasonbrent at gmail.com (Jason Ledbetter) wrote:

>Folk, I have some data that is column formatted with a dual line
>header every few lines; there may also be a variable number of lines
>between each header output. The first line is readily identifiable
>with say "grep ^FOO", but the second line in the header has white
>space at the beginning.

>So what I need to do is the equivalent of "grep -v" to get rid of
>the headers. My ultimate goal is to create a named List[] variable
>for each "column" in the data, but I need to get rid of the headers.

>Alternatively, I could regex match on the lines I DO want I think..
>just trying to wrap my head around how to do this in Mathematica.
>For example, "egrep '^[0-9]+' foo.txt" gets me what I need in most
>shells... I'd just like mathematica to do the same on import.

>Do I need to process the import procedurally and dump the data I
>want into a new Table[]?

Almost certainly, the answer to the above is no. I am not aware
of anything in Mathematica that can be accomplished using
procedural code that cannot be accomplished using functional
code and/or pattern matching.

But beyond this it is difficult to know how to address the
problem you are having. How are you reading the data in to
Mathematica? What separates the columns in the original data
file? Is the first column a string?

Assume the columns are delimited by commas. Then you should be
able to use Import[filename,"CSV"] to read the data into
Mathematica. I believe this will strip the leading white space
from thos lines you indicate start with white space.

Once you have the data read into Mathematic, deleting the
headers can be easily done using DeleteCases. Assume for the
moment the first column of the data with a header is a string
beginning with "FOO". Then

DeleteCases[data, {_?(StrinMatchQ[#,"FOO"~~]&),__}]

will delete the header lines Or if you prefer since Mathematica
does support regular expressions

DeleteCases[data,{_?(StringMatchQ[#,RegularExpression@"FOO.*"]&),__}]

Or if it is easier to identify the lines you want to keep, use
Cases instead of DeleteCases with the appropriate pattern. For
example, assume all of the lines you want have a numeric value
in the first column. Then

Cases[data, {_?NumericQ,__}]

will retain just those lines.

>I've been able to figure out how to get columnar data out of the
>import using "foo=data[[All,{1}]" and so on (e.g., first column in
>the data)... I just need to pre-filter the headers out.

As you have found data[[All,{1}]] will get all of column 1. This
construct returns a 2 dimensional array. You may find it better
to use data[[All,1]] instead which returns a list with only one dimension.

Prev by Date: Re: Thinking Mathematica: Any suggestions?

Next by Date: Re: Mathematica and F#

Previous by thread: Re: How can I do a "grep -v" equivalent in Import[]?

Next by thread: Re: How can I do a "grep -v" equivalent in Import[]?