Re: Using mathematica to read website
- To: mathgroup at smc.vnet.net
- Subject: [mg110345] Re: Using mathematica to read website
- From: telefunkenvf14 <rgorka at gmail.com>
- Date: Sun, 13 Jun 2010 18:53:54 -0400 (EDT)
- References: <hv23ul$5hl$1@smc.vnet.net>
On Jun 13, 3:13 am, kevin <kevin999ko... at gmail.com> wrote: > Hi Guys, > > Is there any way to use mathematica to read all the words of = a > website, saywww.bloomberg.com?Thanks in advance. > > Best, > Kevin I'm not experienced with this kind of thing, but for the case at hand I think what you'll have to do is Import[] as "Source" and then parse the source code for what you want. For a specific example, I'll start on the Technology summary page, with the eventual goal of grabbing the title of each article, the url, and the corresponding short summary: raw = Import[ "http://www.bloomberg.com/news/industries/technology.html", "Source" ] After looking at the raw source code in Firefox, I think this will get us close to grabbing some relevant info: StringPosition[ToString[raw], "<a class=\"summheadline\"" ~~ __ ~~ " </ p>"] And use StringTake[] to grab the above positions: raw2 = StringTake[raw, %] My inexperience in parsing strings stops me here... Hopefully someone else can chime in. I'd like to learn. -RG