nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eggebrecht, Thomas (GfK Marktforschung)" <>
Subject Parameter tuning or how to accelerate fetching
Date Mon, 29 Aug 2011 15:33:48 GMT
Dear List,

My process fetches only 10 but very big domains with millions of pages on each site. I now
wonder way I got after 2 weeks and 17 crawl-fetch cycles only a handful of about 30,000 pages
and it seems stagnating.

How would you accelerate fetching?

My current parameters (using Nutch-1.2):
topN: 40,000
depth: 8
adddays: 30
fetcher.server.delay: 1 500

All parameters not mentioned have standard values as well as regex-urlfilter.txt.

Best Regards


GfK SE, Nuremberg, Germany, commercial register Nuremberg HRB 25014; Management Board: Professor
Dr. Klaus L. W?bbenhorst (CEO), Pamela Knapp (CFO), Dr. Gerhard Hausruckinger, Petra Heinlein,
Debra A. Pruent, Wilhelm R. Wessels; Chairman of the Supervisory Board: Dr. Arno Mahlert
This email and any attachments may contain confidential or privileged information. Please
note that unauthorized copying, disclosure or distribution of the material in this email is
not permitted.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message