MathGroup Archive: September 2008 [00183]

[Date Index] [Thread Index] [Author Index]

Re: How can I do a "grep -v" equivalent in Import[]?

To: mathgroup at smc.vnet.net
Subject: [mg91821] Re: [mg91806] How can I do a "grep -v" equivalent in Import[]?
From: "Jason Ledbetter" <jasonbrent at gmail.com>
Date: Tue, 9 Sep 2008 06:56:32 -0400 (EDT)
References: <g9t6mq$j8s$1@smc.vnet.net> <200809081010.GAA27650@smc.vnet.net>

Valid point:
This will likely line wrap:

--snip--
CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache
Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s
                                 in   out   read  write  read write   age
hit time  ty util                 in   out
24%     0   741     0     767  9381  6144  88815      0     0 81011     9
98%   0%  -   31%      0    26     0     0
--snip--

I'm trying to avoid importing through a pipe and I'm trying to avoid
preprocessing the data (for no real reason other than trying to get a handle
on data processing in Mathematica).

I've done the following:

importTemp = Import[ToFileName[Directory[], "input.txt"], {"Lines"}];
regexMatch = "^\d{2}.*";
sysstat = StringCases[importTemp, RegularExpression[regexMatch]];

which gets me ALMOST what I'm looking for.... I end up with the the
non-matched lines being empty data sets which Part:partw doesn't like.

e.g. output is like this:

{{}, {}, {"24"}, {"29"}, ...


Thank you very much for the replies to date.

-jbl

On Mon, Sep 8, 2008 at 6:10 AM, David Bailey
<dave at remove_thisdbailey.co.uk>wrote:

> Jason Ledbetter wrote:
> > Folk,
> > I have some data that is column formatted with a dual line header every
> few
> > lines; there may also be a variable number of lines between each header
> > output. The first line is readily identifiable with say "grep ^FOO", but
> the
> > second line in the header has white space at the beginning.
> >
> > So what I need to do is the equivalent of "grep -v" to get rid of the
> > headers. My ultimate goal is to create a named List[] variable for each
> > "column" in the data, but I need to get rid of the headers.
> >
> > Alternatively, I could regex match on the lines I DO want I think.. just
> > trying to wrap my head around how to do this in Mathematica.
> > For example, "egrep '^[0-9]+' foo.txt" gets me what I need in most
> shells...
> > I'd just like mathematica to do the same on import.
> >
> > Do I need to process the import procedurally and dump the data I want
> into a
> > new Table[]?
> >
> > I've been able to figure out how to get columnar data out of the import
> > using "foo=data[[All,{1}]" and so on (e.g., first column in the data)...
> I
> > just need to pre-filter the headers out.
> >
> > I apologize for the junior level of the question. :/
> >
> > Thanks!
> >
> > -jbl
> >
> >
> I think that if you post a TINY example of your input file, together
> with the corresponding output that you wish to obtain, someone will give
> you an answer. As it is, your question is a bit vague.
>
> David Bailey
> http://www.dbaileyconsultancy.co.uk
>
>

References:
- Re: How can I do a "grep -v" equivalent in Import[]?
  - From: David Bailey <dave@Remove_Thisdbailey.co.uk>

Prev by Date: Error in file / load entire notebook into memory

Next by Date: Re: NDSolve Problem

Previous by thread: Re: How can I do a "grep -v" equivalent in Import[]?

Next by thread: Re: How can I do a "grep -v" equivalent in Import[]?