Re: Import HTTP data in asynchronous/parallel way
- To: mathgroup at smc.vnet.net
- Subject: [mg125968] Re: Import HTTP data in asynchronous/parallel way
- From: David Bailey <dave at removedbailey.co.uk>
- Date: Tue, 10 Apr 2012 02:29:32 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <jlp34d$14s$1@smc.vnet.net>
On 07/04/2012 10:58, Rodrigo Murta wrote: > Hi All > > I'm testing some web scraping using Mathematica and would like to > know how to work with Import in an asynchronous/parallel way. > Now I'm using Parallelize to do that. It works but it doesn't look > like the best way to do that, due to the kernel number limitation. > > My code is like that: > result = Parallelize[Import/@urlList] > > How can I do it in another asynchronous way? Something like > backgroud process using& in bash? > I can do it inside Mathematica? I know that I could speed up a lot > my scrap with that. > > tks in advance > Murta > Since nobody else has offered a better suggestion, I'll suggest that you pull the data over using Java - which you can call via J/Link in a totally seamless way. I'd write a simple Java class that has a public, static method that takes a list of URL's to try, and uses separate threads to run them in parallel. If it kept a tally of its progress in an array: public static boolean finished[10] Then you could monitor the progress from Mathematica and read the data in the order that it got delivered. To be clear, this would involve writing and compiling a Java class, and using AddToClassPath to make it available to Mathematica's J/Link. Constructing such a thing in pure J/Link code would be very tough! David Bailey http://www.dbaileyconsultancy.co.uk