MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Using mathematica to read website

  • To: mathgroup at
  • Subject: [mg110345] Re: Using mathematica to read website
  • From: telefunkenvf14 <rgorka at>
  • Date: Sun, 13 Jun 2010 18:53:54 -0400 (EDT)
  • References: <hv23ul$5hl$>

On Jun 13, 3:13 am, kevin <kevin999ko... at> wrote:
> Hi Guys,
>       Is there any way to use mathematica to read all the words of =
> website, in advance.
> Best,
> Kevin

I'm not experienced with this kind of thing, but for the case at hand
I think what you'll have to do is Import[] as "Source" and then parse
the source code for what you want.

For a specific example, I'll start on the Technology summary page,
with the eventual goal of grabbing the title of each article, the url,
and the corresponding short summary:

raw = Import[
  "";, "Source"

After looking at the raw source code in Firefox, I think this will get
us close to grabbing some relevant info:

StringPosition[ToString[raw], "<a class=\"summheadline\"" ~~ __ ~~ " </

And use StringTake[] to grab the above positions:

raw2 = StringTake[raw, %]

My inexperience in parsing strings stops me here... Hopefully someone
else can chime in. I'd like to learn.


  • Prev by Date: Can you tell me what is wrong with this program
  • Next by Date: Automatic update of variables
  • Previous by thread: Re: Using mathematica to read website
  • Next by thread: Re: Using mathematica to read website