Re: Problem Importing web site in Mathematica: How to by pass pages asking for login credentials
- To: mathgroup at smc.vnet.net
- Subject: [mg126450] Re: Problem Importing web site in Mathematica: How to by pass pages asking for login credentials
- From: Armand Tamzarian <mike.honeychurch at gmail.com>
- Date: Fri, 11 May 2012 00:14:33 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
On May 9, 5:53 pm, Mark Coleman <markspcole... at gmail.com> wrote: > Hi, > > I'm using Mathematica v8 for some text mining/classification analysis of web > sites. As part of this I first Import[] the hyperlinks from the web > site 's home page into a list, and then systematically traverse this > list and Import each URL. In some cases, I hit a page or set of pages > that requires a user to enter login credentials. At this point my code > pops up the site's login screen and waits for manual input before > proceeding. This obviously makes importing a large set of URLs > infeasible. > > I'm wondering if it's possible to identify these pages in advance, so > I can filter them out of my list of URLs. allowing me to automatically > Import the remaining pages? > > Thanks, > > Mark I use wget in combination with Mathematica to work around logins and cookies. Mike