MathGroup Archive: June 2010 [00224]

[Date Index] [Thread Index] [Author Index]

Re: Using mathematica to read website

To: mathgroup at smc.vnet.net
Subject: [mg110345] Re: Using mathematica to read website
From: telefunkenvf14 <rgorka at gmail.com>
Date: Sun, 13 Jun 2010 18:53:54 -0400 (EDT)
References: <hv23ul$5hl$1@smc.vnet.net>

On Jun 13, 3:13 am, kevin <kevin999ko... at gmail.com> wrote:
> Hi Guys,
>
>       Is there any way to use mathematica to read all the words of =
a
> website, saywww.bloomberg.com?Thanks in advance.
>
> Best,
> Kevin

I'm not experienced with this kind of thing, but for the case at hand
I think what you'll have to do is Import[] as "Source" and then parse
the source code for what you want.

For a specific example, I'll start on the Technology summary page,
with the eventual goal of grabbing the title of each article, the url,
and the corresponding short summary:

raw = Import[
  "http://www.bloomberg.com/news/industries/technology.html";, "Source"
  ]

After looking at the raw source code in Firefox, I think this will get
us close to grabbing some relevant info:

StringPosition[ToString[raw], "<a class=\"summheadline\"" ~~ __ ~~ " </
p>"]

And use StringTake[] to grab the above positions:

raw2 = StringTake[raw, %]

My inexperience in parsing strings stops me here... Hopefully someone
else can chime in. I'd like to learn.

-RG

Prev by Date: Can you tell me what is wrong with this program

Next by Date: Automatic update of variables

Previous by thread: Re: Using mathematica to read website

Next by thread: Re: Using mathematica to read website