MathGroup Archive: March 2005 [00139]

[Date Index] [Thread Index] [Author Index]

Re: Importing HTML tables

To: mathgroup at smc.vnet.net
Subject: [mg54881] Re: [mg54843] Importing HTML tables
From: Mitch.Stonehocker at sungard.com
Date: Sat, 5 Mar 2005 01:34:26 -0500 (EST)
Sender: owner-wri-mathgroup at wolfram.com

I've been playing with this idea for some time.  The closest I've come
is:
1) url="http://some.url";;
2) rawPage=Import[url,"Text"];
3) scrapeFunc[rawPage_, 
      string1_, string2_] := StringTake[StringCases[rawPage, \
ShortestMatch[string1 ~~ __ ~~ string2]], {StringLength[string1] + 1, -(
        StringLength[string2] + 1)}];

where string1 and string 2 are strings in rawPage at opposite sides of
the bits of text you want in rawPage.  Typically, but not always,
string1 and string2 are, or contain unique bits of, HTML tags.

It may take some playing with to get going but it works well under most
conditions.  Also keep in mind, if the author of the page at url every
changes the HTML tag bits you use in string1 and string2 you will have
some work to do.

I'd be very interested if someone has a more generalized approach.

Cheers,

Mitch Stonehocker




-----Original Message-----
From: Ian Roberts [mailto:ian at quantica.com.au]
To: mathgroup at smc.vnet.net
Subject: [mg54881] [mg54843] Importing HTML tables


Has anyone written a package for reading an HTML file and translating 
HTML tables into Lists?
Ian

Prev by Date: Re: computing residues

Next by Date: Re: Mathematica 4.2 and 5 errors

Previous by thread: Importing HTML tables

Next by thread: defining a recursive function