Re: How can I do a "grep -v" equivalent in Import[]?
- To: mathgroup at smc.vnet.net
- Subject: [mg91775] Re: How can I do a "grep -v" equivalent in Import[]?
- From: Bill Rowe <readnews at sbcglobal.net>
- Date: Sun, 7 Sep 2008 05:40:28 -0400 (EDT)
On 9/6/08 at 2:07 AM, jasonbrent at gmail.com (Jason Ledbetter) wrote: >Folk, I have some data that is column formatted with a dual line >header every few lines; there may also be a variable number of lines >between each header output. The first line is readily identifiable >with say "grep ^FOO", but the second line in the header has white >space at the beginning. >So what I need to do is the equivalent of "grep -v" to get rid of >the headers. My ultimate goal is to create a named List[] variable >for each "column" in the data, but I need to get rid of the headers. >Alternatively, I could regex match on the lines I DO want I think.. >just trying to wrap my head around how to do this in Mathematica. >For example, "egrep '^[0-9]+' foo.txt" gets me what I need in most >shells... I'd just like mathematica to do the same on import. >Do I need to process the import procedurally and dump the data I >want into a new Table[]? Almost certainly, the answer to the above is no. I am not aware of anything in Mathematica that can be accomplished using procedural code that cannot be accomplished using functional code and/or pattern matching. But beyond this it is difficult to know how to address the problem you are having. How are you reading the data in to Mathematica? What separates the columns in the original data file? Is the first column a string? Assume the columns are delimited by commas. Then you should be able to use Import[filename,"CSV"] to read the data into Mathematica. I believe this will strip the leading white space from thos lines you indicate start with white space. Once you have the data read into Mathematic, deleting the headers can be easily done using DeleteCases. Assume for the moment the first column of the data with a header is a string beginning with "FOO". Then DeleteCases[data, {_?(StrinMatchQ[#,"FOO"~~]&),__}] will delete the header lines Or if you prefer since Mathematica does support regular expressions DeleteCases[data,{_?(StringMatchQ[#,RegularExpression@"FOO.*"]&),__}] Or if it is easier to identify the lines you want to keep, use Cases instead of DeleteCases with the appropriate pattern. For example, assume all of the lines you want have a numeric value in the first column. Then Cases[data, {_?NumericQ,__}] will retain just those lines. >I've been able to figure out how to get columnar data out of the >import using "foo=data[[All,{1}]" and so on (e.g., first column in >the data)... I just need to pre-filter the headers out. As you have found data[[All,{1}]] will get all of column 1. This construct returns a 2 dimensional array. You may find it better to use data[[All,1]] instead which returns a list with only one dimension.