hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Stubblefield <jason.stu...@gmail.com>
Subject [No Subject]
Date Tue, 12 Jun 2007 19:59:46 GMT
Hi

I am having a problem with the nutch-0.9 fetcher.  During a fetch the  
fetch process I get the following message in my hadoop.log:

2007-06-12 12:23:25,892 INFO  plugin.PluginRepository -         Nutch  
URL Filter (org.apache.nutch.net.URLFilter)2007-06-12 12:23:25,892  
INFO  plugin.PluginRepository -         Nutch Indexing Filter  
(org.apache.nutch.indexer.IndexingFilter)2007-06-12 12:23:25,892  
INFO  plugin.PluginRepository -         Nutch Online Search Results  
Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 
2007-06-12 12:23:25,892 INFO  plugin.PluginRepository -         HTML  
Parse Filter (org.apache.nutch.parse.HtmlParseFilter)2007-06-12  
12:23:25,905 INFO  plugin.PluginRepository -         Nutch Content  
Parser (org.apache.nutch.parse.Parser)2007-06-12 12:23:25,905 INFO   
plugin.PluginRepository -         Nutch Scoring  
(org.apache.nutch.scoring.ScoringFilter)2007-06-12 12:23:25,905 INFO   
plugin.PluginRepository -         Nutch Query Filter  
(org.apache.nutch.searcher.QueryFilter)2007-06-12 12:23:25,905 INFO   
plugin.PluginRepository -         Ontology Model Loader  
(org.apache.nutch.ontology.Ontology)2007-06-12 12:23:25,990 WARN   
regex.RegexURLNormalizer - can't find rules for scope 'outlink',  
using default

this is the last message before the process uses 100% of the system  
resources.  It never exits or gives any other errors.

I am using the local file system on a single machine without map- 
reduce.  I have tried several configurations including JDK5 and JDK 6  
with the same error.  I have had success crawling a different list of  
urls with the exact same settings on the same machine.

~Jason


Jason Stubblefield
jason.stubby@gmail.com

Please enjoy one of my web properties:

http://www.geothingy.com/
http://www.fivemushrooms.com/
http://www.wikitourist.com/



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message