|
[Date Index]
[Thread Index]
[Author Index]
Re: Import HTTP data in asynchronous/parallel way
- To: mathgroup at smc.vnet.net
- Subject: [mg125968] Re: Import HTTP data in asynchronous/parallel way
- From: David Bailey <dave at removedbailey.co.uk>
- Date: Tue, 10 Apr 2012 02:29:32 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <jlp34d$14s$1@smc.vnet.net>
On 07/04/2012 10:58, Rodrigo Murta wrote:
> Hi All
>
> I'm testing some web scraping using Mathematica and would like to
> know how to work with Import in an asynchronous/parallel way.
> Now I'm using Parallelize to do that. It works but it doesn't look
> like the best way to do that, due to the kernel number limitation.
>
> My code is like that:
> result = Parallelize[Import/@urlList]
>
> How can I do it in another asynchronous way? Something like
> backgroud process using& in bash?
> I can do it inside Mathematica? I know that I could speed up a lot
> my scrap with that.
>
> tks in advance
> Murta
>
Since nobody else has offered a better suggestion, I'll suggest that you
pull the data over using Java - which you can call via J/Link in a
totally seamless way.
I'd write a simple Java class that has a public, static method that
takes a list of URL's to try, and uses separate threads to run them in
parallel. If it kept a tally of its progress in an array:
public static boolean finished[10]
Then you could monitor the progress from Mathematica and read the data
in the order that it got delivered.
To be clear, this would involve writing and compiling a Java class, and
using AddToClassPath to make it available to Mathematica's J/Link.
Constructing such a thing in pure J/Link code would be very tough!
David Bailey
http://www.dbaileyconsultancy.co.uk
Prev by Date:
Re: Plotting vectors on a linear function
Next by Date:
Re: Where is ShowGraph?
Previous by thread:
Import HTTP data in asynchronous/parallel way
Next by thread:
Re: Import HTTP data in asynchronous/parallel way
|