nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@nutch.org>
Subject Re: problem, Limiting dynamic pages with static URLs
Date Mon, 03 Oct 2005 17:23:35 GMT
Please see:

http://www.mail-archive.com/nutch-dev@incubator.apache.org/msg00634.html

Doug

Jon Shoberg wrote:
> Some sites use relative links and the fetcher is getting confused.  See 
> the example below:
> 
> http://www.domain.xyz/index.php/research/academics/research/libraries/
> 
> The content returned simply keeps following the few relative links and 
> the URI keeps building.  It basically the same problem as sessionIDs but 
> not something to clealy regex out.
> 
> Anyone see this before? Thoughts?
> 
> -j

Mime
View raw message