hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet
Date Mon, 04 Apr 2016 21:38:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225103#comment-15225103

Kihwal Lee commented on HDFS-10178:

{{TestHFlush}}: HDFS-2043 Will review the patch.
JDK8 failures don't have logs, so it is hard to debug.
{{TestDFSClientRetries}}: timed out. Tried to restart namenode, but timed out. Without seeing
the log, it s hard to know what went wrong. 
{{TestReplication}}: timed out. Datanode shutdown hung at netty shutdown.
java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method)
        at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317)
        at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:207)
        at io.netty.channel.nio.NioEventLoop.wakeup(NioEventLoop.java:590)
        at io.netty.util.concurrent.SingleThreadEventExecutor.shutdownGracefully(SingleThreadEventExecutor.java:503)
        at io.netty.util.concurrent.MultithreadEventExecutorGroup.shutdownGracefully(MultithreadEventExecutorGroup.java:160)
        at io.netty.util.concurrent.AbstractEventExecutorGroup.shutdownGracefully(AbstractEventExecutorGroup.java:70)
        at org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.close(DatanodeHttpServer.java:249)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1863)
{{TestBlockTokenWithDFS}}: The datanode was restarted and having bind exception. The old port
is taken.

Test failures are not related to this patch. They are passing when run on my machine.

> Permanent write failures can happen if pipeline recoveries occur for the first packet
> -------------------------------------------------------------------------------------
>                 Key: HDFS-10178
>                 URL: https://issues.apache.org/jira/browse/HDFS-10178
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, HDFS-10178.v3.patch, HDFS-10178.v4.patch,
> We have observed that write fails permanently if the first packet doesn't go through
properly and pipeline recovery happens. If the write op creates a pipeline, but the actual
data packet does not reach one or more datanodes in time, the pipeline recovery will be done
against the 0-byte partial block.  
> If additional datanodes are added, the block is transferred to the new nodes.  After
the transfer, each node will have a meta file containing the header and 0-length data block
file. The pipeline recovery seems to work correctly up to this point, but write fails when
actual data packet is resent. 

This message was sent by Atlassian JIRA

View raw message