hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "amith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3398) Client will not retry when primaryDN is down once it's just got pipeline
Date Mon, 28 May 2012 04:35:24 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284287#comment-13284287

amith commented on HDFS-3398:

I manually tested the patch by breakpoints in debug mode
Steps :
1. Put a breakpoint in nodes=nextBlockOutputStream(), blockStream.write(...) 
2. Identify the DN primary DN selected, from nodes=nextBlockOutputStream() and when control
reach before blockStream.write(...) kill the primary DN
3. Now blockstream which is pointing to primary DN will not be able to send data so IOException
will be thrown

Result Without patch :
Since in the catch block haserror is set and no errorIndex so we treat is as a client error
and not DN error so client will stop.

Result with patch :
we are handling the IOException from the blockstream and set errorindex to primary DN and
rethrowing the exception we have both errorIndex=0 and hasError=true so this is treated as
DN failure not clirnt failure so client will try to update its pipeline, and continue writing.

> Client will not retry when primaryDN is down once it's just got pipeline
> ------------------------------------------------------------------------
>                 Key: HDFS-3398
>                 URL: https://issues.apache.org/jira/browse/HDFS-3398
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 2.0.0-alpha
>            Reporter: Brahma Reddy Battula
>            Assignee: amith
>            Priority: Minor
>         Attachments: HDFS-3398.patch, HDFS-3398.patch, HDFS_3398_3.patch
> Scenario:
> =========
> Start NN and three DN"S
> Get the datanode to which blocks has to be replicated.
> from 
> {code}
> nodes = nextBlockOutputStream(src);
> {code}
> Before start writing to the DN ,kill the primary DN.
> {code}
> // write out data to remote datanode
>           blockStream.write(buf.array(), buf.position(), buf.remaining());
>           blockStream.flush();
> {code}
> Now write will fail with the exception 
> {noformat}
> 2012-05-10 14:21:47,993 WARN  hdfs.DFSClient (DFSOutputStream.java:run(552)) - DataStreamer
> java.io.IOException: An established connection was aborted by the software in your host
> 	at sun.nio.ch.SocketDispatcher.write0(Native Method)
> 	at sun.nio.ch.SocketDispatcher.write(Unknown Source)
> 	at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
> 	at sun.nio.ch.IOUtil.write(Unknown Source)
> 	at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
> 	at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:60)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> 	at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:151)
> 	at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:112)
> 	at java.io.BufferedOutputStream.write(Unknown Source)
> 	at java.io.DataOutputStream.write(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:513)
> {noformat}
> .

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message