hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jibjoice <sudarat_...@hotmail.com>
Subject Re: Nutch crawl problem
Date Mon, 07 Jan 2008 01:30:38 GMT

why i can crawl http://game.search.com but i can't crawl
http://www.search.com? conf/crawl-urlfilter is

# skip file:, ftp:, & mailto: urls

# skip image and other suffixes we can't yet parse

# skip URLs containing certain characters as probable queries, etc.

# skip URLs with slash-delimited segment that repeats 3+ times, to break

# accept hosts in MY.DOMAIN.NAME

# skip everything else
and some host i can't crawl because have error "Generator: 0 records
selected for fetching, exiting ..." i set the same config for all host.why?
View this message in context: http://www.nabble.com/Nutch-crawl-problem-tp14327978p14657080.html
Sent from the Hadoop Users mailing list archive at Nabble.com.

View raw message