MathGroup Archive 2006

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: importing html files

  • To: mathgroup at smc.vnet.net
  • Subject: [mg70043] Re: importing html files
  • From: bghiggins at ucdavis.edu
  • Date: Sun, 1 Oct 2006 04:08:50 -0400 (EDT)
  • References: <efledp$dfa$1@smc.vnet.net>

Use JLink. Here is some code based on a suggestion by Rolf Mertig that
will read in a web page. Firts you need to load the JLink package:

Needs["JLink`"];

ImportMyURL[url_String]  := JavaBlock[
       Module[{u, s = "", stream, numRead, buf},
                        InstallJava[];
                        u = JavaNew["java.net.URL", url];
                        stream = u@openStream[];
                        If[stream === $Failed, Return[$Failed]];
                        buf = JavaNew["[B", 5000]; s = {}; (* 5000 is
an arbitrary buffer size.*)
                        While[(numRead = stream@read[buf]) > 0,
                        s = AppendTo[s, Take[Val[buf], numRead]]]
                        stream@close[];
                FromCharacterCode[Flatten[s]]]];

ImportMyURL["http://www.higgins.ucdavis.edu";]

The result is one large string.  Once you have the string you can use
StringToStream to open an input stream and then use operations like
Read to get at specific data. One can also use Regular Expressions to
assist you in getting table data.

Note there is an example in the JLink documentation of code for a
similar function called GetURL, but I always had some problems with it
and prefer Mertig approach.


Hope this helps


Cheers,
Brian


hawkmoon269 wrote:
> I'd like to read data from a table on a non-local web page into
> Mathematica.  For the time being it would suffice to just be able to
> read all of the source code of the page in...I know how to use Import
> to do this with a local file, but it's not clear to me how to access a
> non-local file...Help?
> 
> h


  • Prev by Date: Re: four argument form of infix
  • Next by Date: Re: How to make a progress bar?
  • Previous by thread: Re: importing html files
  • Next by thread: Re: importing html files