Services & Resources / Wolfram Forums / MathGroup Archive
-----

MathGroup Archive 2009

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Problem importing HTML with Mathematica

  • To: mathgroup at smc.vnet.net
  • Subject: [mg96239] Re: Problem importing HTML with Mathematica
  • From: Jean-Marc Gulliet <jeanmarc.gulliet at gmail.com>
  • Date: Tue, 10 Feb 2009 05:49:14 -0500 (EST)
  • Organization: The Open University, Milton Keynes, UK
  • References: <gmp0k6$brg$1@smc.vnet.net>

In article <gmp0k6$brg$1 at smc.vnet.net>, oiluj1 at gmail.com wrote:

> I had a Mathematica program that, among other things, had to import
> html code from a web site.
> 
> Recently they made some changes in the web site and the program is not
> importing properly anymore. Instead of getting the source html code I
> would see with my internet browser I am just getting a simplified
> version of it (with no html syntax at all).

As specified in the documentation about the HTML file format [1], 
"Import["file.html"] gives a plain text representation of an HTML file."

> I do not know what may have changed, but as a guess, it seems to me as
> if the web site realized that it is not a proper internet browser the
> one who is connecting and therefore it is sending back a simplified
> version of the page.

Works the same with version 6.0.3 or 7.0 on two different platforms. 
Perhaps you have change something on your system.

> As an example, you can just run:
> Import["http://www.atpworldtour.com/5/en/vault/draws.asp?TournamentID=
> \
> 352&TournamentYear=1993"]

Anyway, add the element "Source" to your query to get the HTML source of 
the page.

In[2]:= Import["http://www.atpworldtour.com/5/en/vault/draws.asp?\
TournamentID=352&TournamentYear=1993", "Source"]

Out[2]= "

<!-- PAGE TITLE -->
<html><head><title>atpworldtour.com - Event Draw</title>

<!-- include virtual = \"/en/common/top/header_nrbt.asp\" -->
<meta name=\"robots\" content=\"noindex, nofollow\">
<meta http-equiv=\"content-type\" \
content=\"text/html;charset=iso-8859-1\">

[... Very long block of HTML code deleted ...]
      
         <td><img src=\"space.gif\" height=\"0\" width=\"100\"></td>
      
      </tr>
      </table>
</div>
</body>
</html>
"

Regards,
--Jean-Marc

[1] "HTML (.html, .htm)", doc centre ref/format/HTML, web
http://reference.wolfram.com/mathematica/ref/format/HTML.html


  • Prev by Date: Re: newbie: explicit function works, "function object" doesn't
  • Next by Date: Re: testing if a point is inside a polygon
  • Previous by thread: Re: Problem importing HTML with Mathematica
  • Next by thread: Re: Problem importing HTML with Mathematica