Re: Problem importing HTML with Mathematica
- To: mathgroup at smc.vnet.net
- Subject: [mg96291] Re: Problem importing HTML with Mathematica
- From: Jean-Marc Gulliet <jeanmarc.gulliet at gmail.com>
- Date: Wed, 11 Feb 2009 05:18:12 -0500 (EST)
- Organization: The Open University, Milton Keynes, UK
- References: <gmrmfv$a0o$1@smc.vnet.net>
In article <gmrmfv$a0o$1 at smc.vnet.net>, Julio <oiluj1 at gmail.com> wrote: > I had a Mathematica program that, among other things, had to import > html code from a web site. > > Recently they made some changes in the web site and the program is not > importing properly anymore. Instead of getting the source html code I > would see with my internet browser I am just getting a simplified > version of it (with no html syntax at all). As specified in the documentation about the HTML file format [1], "Import["file.html"] gives a plain text representation of an HTML file." > I do not know what may have changed, but as a guess, it seems to me as > if the web site realized that it is not a proper internet browser the > one who is connecting and therefore it is sending back a simplified > version of the page. Works the same with version 6.0.3 or 7.0 on two different platforms. Perhaps you have change something on your system. > As an example, you can just run: > Import["http://www.atpworldtour.com/5/en/vault/draws.asp?TournamentID= > \ > 352&TournamentYear=1993"] Anyway, add the element "Source" to your query to get the HTML source of the page. In[2]:= Import["http://www.atpworldtour.com/5/en/vault/draws.asp?\ TournamentID=352&TournamentYear=1993", "Source"] Out[2]= " <!-- PAGE TITLE --> <html><head><title>atpworldtour.com - Event Draw</title> <!-- include virtual = \"/en/common/top/header_nrbt.asp\" --> <meta name=\"robots\" content=\"noindex, nofollow\"> <meta http-equiv=\"content-type\" \ content=\"text/html;charset=iso-8859-1\"> [... Very long block of HTML code deleted ...] <td><img src=\"space.gif\" height=\"0\" width=\"100\"></td> </tr> </table> </div> </body> </html> " Regards, --Jean-Marc [1] "HTML (.html, .htm)", doc centre ref/format/HTML, web http://reference.wolfram.com/mathematica/ref/format/HTML.html