nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject Re: Nutch unfetched urls count
Date Mon, 06 Feb 2012 15:47:42 GMT
the filter is not responsible. If a URL is filtered out it goes to /dev/null 
and does not show up as a db_unfetched record.

On Friday 03 February 2012 09:37:08 nutchsolruser wrote:
> I have huge count of db_unfetched urls, i jus want to know which are the
> cases in which url go to db_unfetched state. suppose i found 10 urls while
> parsing pages, out of 10 urls 5 urls satisfy my urlfilter.txt then in this
> case all 10 urls will go in db_unfetched or only 5 urls that satisfy my
> urlfilter.txt will go in unfetched queue in update_db phase.
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nutch-unfetched-urls-count-tp3712459p37
> 12459.html Sent from the Nutch - User mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex

Mime
View raw message