hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Barry Haddow <bhad...@inf.ed.ac.uk>
Subject datanode errors during nutch crawl
Date Wed, 30 Jan 2008 10:48:37 GMT
Hi

I'm trying to set up a nutch crawl using hadoop, and the crawl normally stops 
at depth 0, although sometimes it goes to depth 1. It should continue to 
depth 3.

I think the problem may be in hadoop, since I'm seeing various errors in the 
datanode log files, such as:

2008-01-30 10:27:51,487 WARN  dfs.DataNode - Failed to transfer 
blk_3160625876530276979 to 129.215.164.52:51010 got java.net.SocketException: 
Connection reset

I can telnet to this ip/port so I don't think it's firewalled.

Also:
2008-01-30 10:27:17,157 ERROR dfs.DataNode - DataXceiver: java.io.IOException: 
Block blk_-3070006959369401863 has already been started (though not 
completed), and thus cannot be created.
2008-01-30 10:27:56,217 ERROR dfs.DataNode - DataXceiver: java.io.IOException: 
Block blk_-712543843244766261 is valid, and cannot be written to.
2008-01-30 10:34:59,510 ERROR dfs.DataNode - DataXceiver: java.io.EOFException

I assume these errors are indicative of some problem in my hadoop 
configuration, but I can't see what. 
I'm using hadoop 0.15.0, distributed with nutch 2008-01-25

Any suggestions?
thanks and regards
Barry

Mime
View raw message