MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Importing HTML tables

  • To: mathgroup at smc.vnet.net
  • Subject: [mg54881] Re: [mg54843] Importing HTML tables
  • From: Mitch.Stonehocker at sungard.com
  • Date: Sat, 5 Mar 2005 01:34:26 -0500 (EST)
  • Sender: owner-wri-mathgroup at wolfram.com

I've been playing with this idea for some time.  The closest I've come
is:
1) url="http://some.url";;
2) rawPage=Import[url,"Text"];
3) scrapeFunc[rawPage_, 
      string1_, string2_] := StringTake[StringCases[rawPage, \
ShortestMatch[string1 ~~ __ ~~ string2]], {StringLength[string1] + 1, -(
        StringLength[string2] + 1)}];

where string1 and string 2 are strings in rawPage at opposite sides of
the bits of text you want in rawPage.  Typically, but not always,
string1 and string2 are, or contain unique bits of, HTML tags.

It may take some playing with to get going but it works well under most
conditions.  Also keep in mind, if the author of the page at url every
changes the HTML tag bits you use in string1 and string2 you will have
some work to do.

I'd be very interested if someone has a more generalized approach.

Cheers,

Mitch Stonehocker




-----Original Message-----
From: Ian Roberts [mailto:ian at quantica.com.au]
To: mathgroup at smc.vnet.net
Subject: [mg54881] [mg54843] Importing HTML tables


Has anyone written a package for reading an HTML file and translating 
HTML tables into Lists?
Ian


  • Prev by Date: Re: computing residues
  • Next by Date: Re: Mathematica 4.2 and 5 errors
  • Previous by thread: Importing HTML tables
  • Next by thread: defining a recursive function