MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Importing HTML tables

  • To: mathgroup at
  • Subject: [mg54881] Re: [mg54843] Importing HTML tables
  • From: Mitch.Stonehocker at
  • Date: Sat, 5 Mar 2005 01:34:26 -0500 (EST)
  • Sender: owner-wri-mathgroup at

I've been playing with this idea for some time.  The closest I've come
1) url="http://some.url";;
2) rawPage=Import[url,"Text"];
3) scrapeFunc[rawPage_, 
      string1_, string2_] := StringTake[StringCases[rawPage, \
ShortestMatch[string1 ~~ __ ~~ string2]], {StringLength[string1] + 1, -(
        StringLength[string2] + 1)}];

where string1 and string 2 are strings in rawPage at opposite sides of
the bits of text you want in rawPage.  Typically, but not always,
string1 and string2 are, or contain unique bits of, HTML tags.

It may take some playing with to get going but it works well under most
conditions.  Also keep in mind, if the author of the page at url every
changes the HTML tag bits you use in string1 and string2 you will have
some work to do.

I'd be very interested if someone has a more generalized approach.


Mitch Stonehocker

-----Original Message-----
From: Ian Roberts [mailto:ian at]
To: mathgroup at
Subject: [mg54881] [mg54843] Importing HTML tables

Has anyone written a package for reading an HTML file and translating 
HTML tables into Lists?

  • Prev by Date: Re: computing residues
  • Next by Date: Re: Mathematica 4.2 and 5 errors
  • Previous by thread: Importing HTML tables
  • Next by thread: defining a recursive function