nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel <>
Subject Dedup
Date Thu, 02 Aug 2007 16:02:01 GMT
Dedup process are quite usefull, unfortunetely the url of the content
deleted are not removed from the Crawldb.

Don't u think we could either remove it from the DB or change the status and
fetchinterval to avoid to fetch it again so quickly ?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message