Re: Importing HTML tables
- To: mathgroup at smc.vnet.net
- Subject: [mg54881] Re: [mg54843] Importing HTML tables
- From: Mitch.Stonehocker at sungard.com
- Date: Sat, 5 Mar 2005 01:34:26 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
I've been playing with this idea for some time. The closest I've come is: 1) url="http://some.url"; 2) rawPage=Import[url,"Text"]; 3) scrapeFunc[rawPage_, string1_, string2_] := StringTake[StringCases[rawPage, \ ShortestMatch[string1 ~~ __ ~~ string2]], {StringLength[string1] + 1, -( StringLength[string2] + 1)}]; where string1 and string 2 are strings in rawPage at opposite sides of the bits of text you want in rawPage. Typically, but not always, string1 and string2 are, or contain unique bits of, HTML tags. It may take some playing with to get going but it works well under most conditions. Also keep in mind, if the author of the page at url every changes the HTML tag bits you use in string1 and string2 you will have some work to do. I'd be very interested if someone has a more generalized approach. Cheers, Mitch Stonehocker -----Original Message----- From: Ian Roberts [mailto:ian at quantica.com.au] To: mathgroup at smc.vnet.net Subject: [mg54881] [mg54843] Importing HTML tables Has anyone written a package for reading an HTML file and translating HTML tables into Lists? Ian