Re: Import html
- To: mathgroup at smc.vnet.net
- Subject: [mg108466] Re: Import html
- From: "Sjoerd C. de Vries" <sjoerd.c.devries at gmail.com>
- Date: Fri, 19 Mar 2010 02:47:01 -0500 (EST)
- References: <hnsrtt$5ks$1@smc.vnet.net>
Hi Scipione, Given the uncommon file extension you have to make explicit that you're dealing with html, like this: Import["http://www.paginegialle.it/ascensoriromamir.a.m", {"HTML", "Hyperlinks"}] Cheers -- Sjoerd On Mar 18, 11:31 am, Scipione Dal Ferro <scipionedalfe... at yahoo.it> wrote: > Hi there, > > I use Import to parse the hyperlinks of many similar html pages without any problem, but for few pages (as for the example in the subject) it fails. > More in detail, here the example with the result: > > In[1]:= Import["http://www.paginegialle.it/ascensoriromamir.a.m", "Hyperlinks"] > > Read::readt: Invalid input found when reading <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> > from C:\Users\scipione.dalferro\AppData\Local\Temp\mFA3E.tmp\ascensoriromamir.a.m. >> > > Out[1]= $Failed > > The error messages states there's an invalid input; anyway the page can be opened with a browser correctly. > > I tried changing the Element to "Source" or other, but with the same result. > Similar pages work correctly, as this one for example: > > In[2]:=Import["http://www.paginegialle.it/esis", "Hyperlinks"] > > Hope u can help me to understand this issue. > > Thanks, > Scipione