MathGroup Archive 2012

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Import HTTP data in asynchronous/parallel way

  • To: mathgroup at smc.vnet.net
  • Subject: [mg125968] Re: Import HTTP data in asynchronous/parallel way
  • From: David Bailey <dave at removedbailey.co.uk>
  • Date: Tue, 10 Apr 2012 02:29:32 -0400 (EDT)
  • Delivered-to: l-mathgroup@mail-archive0.wolfram.com
  • References: <jlp34d$14s$1@smc.vnet.net>

On 07/04/2012 10:58, Rodrigo Murta wrote:
> Hi All
>
>      I'm testing some web scraping using Mathematica and would like to
> know how to work with Import in an asynchronous/parallel way.
>      Now I'm using Parallelize to do that. It works but it doesn't look
> like the best way to do that, due to the kernel number limitation.
>
>     My code is like that:
>     result = Parallelize[Import/@urlList]
>
>    How can I do it in another asynchronous way? Something like
> backgroud process using&  in bash?
>    I can do it inside Mathematica? I know that I could speed up a lot
> my scrap with that.
>
> tks in advance
> Murta
>
Since nobody else has offered a better suggestion, I'll suggest that you 
pull the data over using Java - which you can call via J/Link in a 
totally seamless way.

I'd write a simple Java class that has a public, static method that 
takes a list of URL's to try, and uses separate threads to run them in 
parallel. If it kept a tally of its progress in an array:

public static boolean finished[10]

Then you could monitor the progress from Mathematica and read the data 
in the order that it got delivered.

To be clear, this would involve writing and compiling a Java class, and 
using AddToClassPath to make it available to Mathematica's J/Link. 
Constructing such a thing in pure J/Link code would be very tough!

David Bailey
http://www.dbaileyconsultancy.co.uk



  • Prev by Date: Re: Plotting vectors on a linear function
  • Next by Date: Re: Where is ShowGraph?
  • Previous by thread: Import HTTP data in asynchronous/parallel way
  • Next by thread: Re: Import HTTP data in asynchronous/parallel way