Services & Resources / Wolfram Forums / MathGroup Archive
-----

MathGroup Archive 2009

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Problem importing HTML with Mathematica

  • To: mathgroup at smc.vnet.net
  • Subject: [mg96291] Re: Problem importing HTML with Mathematica
  • From: Jean-Marc Gulliet <jeanmarc.gulliet at gmail.com>
  • Date: Wed, 11 Feb 2009 05:18:12 -0500 (EST)
  • Organization: The Open University, Milton Keynes, UK
  • References: <gmrmfv$a0o$1@smc.vnet.net>

In article <gmrmfv$a0o$1 at smc.vnet.net>, Julio <oiluj1 at gmail.com> wrote:

> I had a Mathematica program that, among other things, had to import
> html code from a web site.
>
> Recently they made some changes in the web site and the program is not
> importing properly anymore. Instead of getting the source html code I
> would see with my internet browser I am just getting a simplified
> version of it (with no html syntax at all).

As specified in the documentation about the HTML file format [1],
"Import["file.html"] gives a plain text representation of an HTML file."

> I do not know what may have changed, but as a guess, it seems to me as
> if the web site realized that it is not a proper internet browser the
> one who is connecting and therefore it is sending back a simplified
> version of the page.

Works the same with version 6.0.3 or 7.0 on two different platforms.
Perhaps you have change something on your system.

> As an example, you can just run:
> Import["http://www.atpworldtour.com/5/en/vault/draws.asp?TournamentID=
> \
> 352&TournamentYear=1993"]

Anyway, add the element "Source" to your query to get the HTML source of
the page.

In[2]:= Import["http://www.atpworldtour.com/5/en/vault/draws.asp?\
TournamentID=352&TournamentYear=1993", "Source"]

Out[2]= "

<!-- PAGE TITLE -->
<html><head><title>atpworldtour.com - Event Draw</title>

<!-- include virtual = \"/en/common/top/header_nrbt.asp\" -->
<meta name=\"robots\" content=\"noindex, nofollow\">
<meta http-equiv=\"content-type\" \
content=\"text/html;charset=iso-8859-1\">

[... Very long block of HTML code deleted ...]

        <td><img src=\"space.gif\" height=\"0\" width=\"100\"></td>

     </tr>
     </table>
</div>
</body>
</html>
"

Regards,
--Jean-Marc

[1] "HTML (.html, .htm)", doc centre ref/format/HTML, web
http://reference.wolfram.com/mathematica/ref/format/HTML.html


  • Prev by Date: Re: Manipulating list of functions
  • Next by Date: Re: testing if a point is inside a polygon
  • Previous by thread: Re: Problem importing HTML with Mathematica
  • Next by thread: weather blog and ListStreamPlot sampling