Re: importing html files
- To: mathgroup at smc.vnet.net
- Subject: [mg70043] Re: importing html files
- From: bghiggins at ucdavis.edu
- Date: Sun, 1 Oct 2006 04:08:50 -0400 (EDT)
- References: <efledp$dfa$1@smc.vnet.net>
Use JLink. Here is some code based on a suggestion by Rolf Mertig that will read in a web page. Firts you need to load the JLink package: Needs["JLink`"]; ImportMyURL[url_String] := JavaBlock[ Module[{u, s = "", stream, numRead, buf}, InstallJava[]; u = JavaNew["java.net.URL", url]; stream = u@openStream[]; If[stream === $Failed, Return[$Failed]]; buf = JavaNew["[B", 5000]; s = {}; (* 5000 is an arbitrary buffer size.*) While[(numRead = stream@read[buf]) > 0, s = AppendTo[s, Take[Val[buf], numRead]]] stream@close[]; FromCharacterCode[Flatten[s]]]]; ImportMyURL["http://www.higgins.ucdavis.edu"] The result is one large string. Once you have the string you can use StringToStream to open an input stream and then use operations like Read to get at specific data. One can also use Regular Expressions to assist you in getting table data. Note there is an example in the JLink documentation of code for a similar function called GetURL, but I always had some problems with it and prefer Mertig approach. Hope this helps Cheers, Brian hawkmoon269 wrote: > I'd like to read data from a table on a non-local web page into > Mathematica. For the time being it would suffice to just be able to > read all of the source code of the page in...I know how to use Import > to do this with a local file, but it's not clear to me how to access a > non-local file...Help? > > h