Hello List,
my CrawlDb contains a few urls:
nutch readdb crawl/crawldb -stats
CrawlDb statistics start: crawl/crawldb
Statistics for CrawlDb: crawl/crawldb
TOTAL urls: 1832
retry 0: 1832
min score: 1.0
avg score: 1.0
max score: 1.0
status 1 (db_unfetched): 1832
CrawlDb statistics: done
but the generator always return "0 records selected" even with the
-noFilter -noNorm Parameter?
nutch generate crawl/crawldb crawl/segments -topN 100 -noNorm -noFilter
Generator: starting at 2011-12-08 03:37:20
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: false
Generator: normalizing: false
Generator: topN: 100
Generator: 0 records selected for fetching, exiting …
What prevents the generator from selecting urls for fetching?
Any hints?
Greets,
Rafael.
|