|
[Date Index]
[Thread Index]
[Author Index]
Re: Using mathematica to read website
- To: mathgroup at smc.vnet.net
- Subject: [mg110345] Re: Using mathematica to read website
- From: telefunkenvf14 <rgorka at gmail.com>
- Date: Sun, 13 Jun 2010 18:53:54 -0400 (EDT)
- References: <hv23ul$5hl$1@smc.vnet.net>
On Jun 13, 3:13 am, kevin <kevin999ko... at gmail.com> wrote:
> Hi Guys,
>
> Is there any way to use mathematica to read all the words of =
a
> website, saywww.bloomberg.com?Thanks in advance.
>
> Best,
> Kevin
I'm not experienced with this kind of thing, but for the case at hand
I think what you'll have to do is Import[] as "Source" and then parse
the source code for what you want.
For a specific example, I'll start on the Technology summary page,
with the eventual goal of grabbing the title of each article, the url,
and the corresponding short summary:
raw = Import[
"http://www.bloomberg.com/news/industries/technology.html", "Source"
]
After looking at the raw source code in Firefox, I think this will get
us close to grabbing some relevant info:
StringPosition[ToString[raw], "<a class=\"summheadline\"" ~~ __ ~~ " </
p>"]
And use StringTake[] to grab the above positions:
raw2 = StringTake[raw, %]
My inexperience in parsing strings stops me here... Hopefully someone
else can chime in. I'd like to learn.
-RG
Prev by Date:
Can you tell me what is wrong with this program
Next by Date:
Automatic update of variables
Previous by thread:
Re: Using mathematica to read website
Next by thread:
Re: Using mathematica to read website
|