MathGroup Archive: March 2010 [00624]

[Date Index] [Thread Index] [Author Index]

Re: Import html

To: mathgroup at smc.vnet.net
Subject: [mg108466] Re: Import html
From: "Sjoerd C. de Vries" <sjoerd.c.devries at gmail.com>
Date: Fri, 19 Mar 2010 02:47:01 -0500 (EST)
References: <hnsrtt$5ks$1@smc.vnet.net>

Hi Scipione,

Given the uncommon file extension you have to make explicit that
you're dealing with html, like this:

Import["http://www.paginegialle.it/ascensoriromamir.a.m";, {"HTML",
  "Hyperlinks"}]

Cheers -- Sjoerd

On Mar 18, 11:31 am, Scipione Dal Ferro <scipionedalfe... at yahoo.it>
wrote:
> Hi there,
>
> I use Import to parse the hyperlinks of many similar html pages without any problem, but for few pages (as for the example in the subject) it fails.
> More in detail, here the example with the result:
>
> In[1]:= Import["http://www.paginegialle.it/ascensoriromamir.a.m";, "Hyperlinks"]
>
> Read::readt: Invalid input found when reading <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
>  from C:\Users\scipione.dalferro\AppData\Local\Temp\mFA3E.tmp\ascensoriromamir.a.m. >>
>
> Out[1]= $Failed
>
> The error messages states there's an invalid input; anyway the page can be opened with a browser correctly.
>
> I tried changing the Element to "Source" or other, but with the same result.
> Similar pages work correctly, as this one for example:
>
> In[2]:=Import["http://www.paginegialle.it/esis";, "Hyperlinks"]
>
> Hope u can help me to understand this issue.
>
> Thanks,
> Scipione

Prev by Date: Re: Butterworth filter

Next by Date: Re: InterpolatingFunction and NIntegrate

Previous by thread: Re: Import html

Next by thread: Re: Import html