nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santiago PĂ©rez <>
Subject Mutithreaded parsing
Date Mon, 28 Dec 2009 11:09:25 GMT


I am developping a modification in Nutch for only accepting outlinks of
Spanish url. I have implemented downloading and parsing the content of each
outlink (in ParseOutFormat) with Jericho and detecting the language with

This proccess seems too heavy, especially because it is done by only one
thread, so I would thank any idea for:

Any easier way for detecting the language of an outlink?
Any way for performing a multithreaded outlink extraction as fetcher does?

Thanks in advance
View this message in context:
Sent from the Nutch - Dev mailing list archive at

View raw message