MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Using mathematica to read website

  • To: mathgroup at smc.vnet.net
  • Subject: [mg110345] Re: Using mathematica to read website
  • From: telefunkenvf14 <rgorka at gmail.com>
  • Date: Sun, 13 Jun 2010 18:53:54 -0400 (EDT)
  • References: <hv23ul$5hl$1@smc.vnet.net>

On Jun 13, 3:13 am, kevin <kevin999ko... at gmail.com> wrote:
> Hi Guys,
>
>       Is there any way to use mathematica to read all the words of =
a
> website, saywww.bloomberg.com?Thanks in advance.
>
> Best,
> Kevin

I'm not experienced with this kind of thing, but for the case at hand
I think what you'll have to do is Import[] as "Source" and then parse
the source code for what you want.

For a specific example, I'll start on the Technology summary page,
with the eventual goal of grabbing the title of each article, the url,
and the corresponding short summary:

raw = Import[
  "http://www.bloomberg.com/news/industries/technology.html";, "Source"
  ]

After looking at the raw source code in Firefox, I think this will get
us close to grabbing some relevant info:

StringPosition[ToString[raw], "<a class=\"summheadline\"" ~~ __ ~~ " </
p>"]

And use StringTake[] to grab the above positions:

raw2 = StringTake[raw, %]

My inexperience in parsing strings stops me here... Hopefully someone
else can chime in. I'd like to learn.

-RG






  • Prev by Date: Can you tell me what is wrong with this program
  • Next by Date: Automatic update of variables
  • Previous by thread: Re: Using mathematica to read website
  • Next by thread: Re: Using mathematica to read website