hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kumar Pandey <kumar.pan...@gmail.com>
Subject Re: hadoop 0.19.0 and data node failure
Date Sat, 17 Jan 2009 01:52:26 GMT
Played with it a bit more and made few observations which I thought I'd
share

1) If replica is set to 2 and you have minimum of 2 datanodes running then
other datanodes going down doesn't affect the write and read.
2) If datanodes is < replica write still goes through provided namenode
recognizes that the datanodes are dead. This can take upto 10 min by default
but can be tweaked with heartbeat.recheck.interval



On Fri, Jan 16, 2009 at 6:56 AM, Kumar Pandey <kumar.pandey@gmail.com>wrote:

> Thanks Brian I'll try with 3 data node and bringing down one with replica
> as 2.
>  I should probably go ahead and file a bug for the fact that although write
> failed, the file was listed under directory listing with size zero and
> subsequent write attempt with both nodes up failed with following error.
>
>  Target jukebox/9979_D4FE01E0-DD119BDE-3000CB83-EB857348_21.wav already
> exists
>
>
>
> On Fri, Jan 16, 2009 at 6:10 AM, Brian Bockelman <bbockelm@cse.unl.edu>wrote:
>
>> Hey Kumar,
>>
>> Hadoop won't let you write new blocks if it can't write them at the right
>> replica level.
>>
>> You've requested to write a block with two replicas on a system where
>> there's only one datanode alive.  I'd hope that it wouldn't let you create a
>> new file!
>>
>> Brian
>>
>>
>> On Jan 16, 2009, at 12:02 AM, Kumar Pandey wrote:
>>
>>  To test hadoop's fault tolerence I tried the following node A -- name
>>> node
>>> and secondaryname node
>>> nodeB  - datanode
>>> nodeC  - datanode
>>>
>>> replica set to 2.
>>> When A, B and C are running I'm able to make a round trip for a wav file.
>>>
>>> Now to test fault tolerence I brought nodeB down and tried to write a
>>> file.
>>> Writing failed even though nodeC was up and running with following msg.
>>> More interestingly the file of size was listed in the name node.
>>> I would have expected hadoop to write the file to NodeB
>>>
>>> ##############error msg###################
>>> [hadoop@cancunvm1 testfiles]$ hadoop fs -copyFromLocal
>>> 9979_D4FE01E0-DD119BDE-3000CB83-EB857348.wav
>>> jukebox/9979_D4FE01E0-DD119BDE-3000CB83-EB857348_21.wav
>>>
>>> 09/01/16 01:47:09 INFO hdfs.DFSClient: Exception in
>>> createBlockOutputStream
>>> java.net.SocketTimeoutException
>>> 09/01/16 01:47:09 INFO hdfs.DFSClient: Abandoning block
>>> blk_4025795281260753088_1216
>>> 09/01/16 01:47:09 INFO hdfs.DFSClient: Waiting to find target node:
>>> 10.0.3.136:50010
>>> 09/01/16 01:47:18 INFO hdfs.DFSClient: Exception in
>>> createBlockOutputStream
>>> java.net.NoRouteToHostException: No route to host
>>> 09/01/16 01:47:18 INFO hdfs.DFSClient: Abandoning block
>>> blk_-2076345051085316536_1216
>>> 09/01/16 01:47:27 INFO hdfs.DFSClient: Exception in
>>> createBlockOutputStream
>>> java.net.NoRouteToHostException: No route to host
>>> 09/01/16 01:47:27 INFO hdfs.DFSClient: Abandoning block
>>> blk_2666380449580768625_1216
>>> 09/01/16 01:47:36 INFO hdfs.DFSClient: Exception in
>>> createBlockOutputStream
>>> java.net.NoRouteToHostException: No route to host
>>> 09/01/16 01:47:36 INFO hdfs.DFSClient: Abandoning block
>>> blk_742770163755453348_1216
>>> 09/01/16 01:47:42 WARN hdfs.DFSClient: DataStreamer Exception:
>>> java.io.IOException: Unable to create new block.
>>>       at
>>>
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723)
>>>       at
>>>
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
>>>       at
>>>
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
>>>
>>> 09/01/16 01:47:42 WARN hdfs.DFSClient: Error Recovery for block
>>> blk_742770163755453348_1216 bad datanode[0] nodes == null
>>> 09/01/16 01:47:42 WARN hdfs.DFSClient: Could not get block locations.
>>> Aborting...
>>> copyFromLocal: No route to host
>>> Exception closing file
>>> /user/hadoop/jukebox/9979_D4FE01E0-DD119BDE-3000CB83-EB857348_21.wav
>>> java.io.IOException: Filesystem closed
>>>       at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
>>>       at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:65)
>>>       at
>>>
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3084)
>>>       at
>>>
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3053)
>>>       at
>>> org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942)
>>>       at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:210)
>>>
>>
>>
>
>
> --
> Kumar Pandey
> http://www.linkedin.com/in/kumarpandey
>



-- 
Kumar Pandey
http://www.linkedin.com/in/kumarpandey

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message