hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3398) Client will not retry when primaryDN is down once it's just got pipeline
Date Thu, 10 May 2012 18:24:51 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272598#comment-13272598
] 

Uma Maheswara Rao G commented on HDFS-3398:
-------------------------------------------

Seems to be a good catch Brahma.

@Todd, It looks to be problem to me Todd. When writing on to socket if other peer goes down,
it may treat that as client error and client will exit.
How about catching socket operations and setting errorIndex to 1 (treating first node as bad)?

I did not see the below check  in 205 code.
         {code}
	 if (errorIndex == -1) { // not a datanode error
            streamerClosed = true;
          }
	  {code}

205 code on throwable:
{code}
  } catch (Throwable e) {
              LOG.warn("DataStreamer Exception: " + 
                       StringUtils.stringifyException(e));
              if (e instanceof IOException) {
                setLastException((IOException)e);
              }
              hasError = true;
            }
          }
 {code}


 In trunk:
 {code}
  } catch (Throwable e) {
          DFSClient.LOG.warn("DataStreamer Exception", e);
          if (e instanceof IOException) {
            setLastException((IOException)e);
          }
          hasError = true;
          if (errorIndex == -1) { // not a datanode error
            streamerClosed = true;
          }
        }
{code}

                
> Client will not retry when primaryDN is down once it's just got pipeline
> ------------------------------------------------------------------------
>
>                 Key: HDFS-3398
>                 URL: https://issues.apache.org/jira/browse/HDFS-3398
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 2.0.0
>            Reporter: Brahma Reddy Battula
>            Priority: Minor
>
> Scenario:
> =========
> Start NN and three DN"S
> Get the datanode to which blocks has to be replicated.
> from 
> {code}
> nodes = nextBlockOutputStream(src);
> {code}
> Before start writing to the DN ,kill the primary DN.
> {code}
> // write out data to remote datanode
>           blockStream.write(buf.array(), buf.position(), buf.remaining());
>           blockStream.flush();
> {code}
> Now write will fail with the exception 
> {noformat}
> 2012-05-10 14:21:47,993 WARN  hdfs.DFSClient (DFSOutputStream.java:run(552)) - DataStreamer
Exception
> java.io.IOException: An established connection was aborted by the software in your host
machine
> 	at sun.nio.ch.SocketDispatcher.write0(Native Method)
> 	at sun.nio.ch.SocketDispatcher.write(Unknown Source)
> 	at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
> 	at sun.nio.ch.IOUtil.write(Unknown Source)
> 	at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
> 	at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:60)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> 	at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:151)
> 	at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:112)
> 	at java.io.BufferedOutputStream.write(Unknown Source)
> 	at java.io.DataOutputStream.write(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:513)
> {noformat}
> .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message