|
[Date Index]
[Thread Index]
[Author Index]
Re: Using mathematica to read website
- To: mathgroup at smc.vnet.net
- Subject: [mg110365] Re: Using mathematica to read website
- From: "Hans Michel" <hmichel at cox.net>
- Date: Tue, 15 Jun 2010 02:30:28 -0400 (EDT)
- References: <hv23ul$5hl$1@smc.vnet.net>
Try
ReadList[StringToStream[
StringReplace[Import["http://www.bloomberg.com/", "Source"],
RegularExpression["<(.|\\n)*?>"] -> " "]], Word,
WordSeparators -> {" ", "\t", "\n"}]
This particular page does not parse well as Plaintext. Even the XMLObject is
missing the body element. Thus Data, FullData, Hyperlinks, Plaintext are
either blank or empty.
In[1]:= Import["http://www.bloomberg.com/","Elements"]
Out[1]= {Data,FullData,Hyperlinks,Plaintext,Source,Title,XMLObject}
So read the source and try som brute force regex for the tags, and stream
and parse result to a list by word.
Hans
"kevin" <kevin999koshy at gmail.com> wrote in message
news:hv23ul$5hl$1 at smc.vnet.net...
> Hi Guys,
>
> Is there any way to use mathematica to read all the words of a
> website, say www.bloomberg.com? Thanks in advance.
>
> Best,
> Kevin
>
Prev by Date:
Re: WORKBENCH VS MATHEMATICA
Next by Date:
Reading Binary Data from SQL Request
Previous by thread:
Re: Using mathematica to read website
Next by thread:
Re: Using mathematica to read website
|