hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet
Date Mon, 04 Apr 2016 21:38:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225103#comment-15225103
] 

Kihwal Lee commented on HDFS-10178:
-----------------------------------

{{TestHFlush}}: HDFS-2043 Will review the patch.
JDK8 failures don't have logs, so it is hard to debug.
{{TestDFSClientRetries}}: timed out. Tried to restart namenode, but timed out. Without seeing
the log, it s hard to know what went wrong. 
{{TestReplication}}: timed out. Datanode shutdown hung at netty shutdown.
{noformat}
java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method)
        at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317)
        at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:207)
        at io.netty.channel.nio.NioEventLoop.wakeup(NioEventLoop.java:590)
        at io.netty.util.concurrent.SingleThreadEventExecutor.shutdownGracefully(SingleThreadEventExecutor.java:503)
        at io.netty.util.concurrent.MultithreadEventExecutorGroup.shutdownGracefully(MultithreadEventExecutorGroup.java:160)
        at io.netty.util.concurrent.AbstractEventExecutorGroup.shutdownGracefully(AbstractEventExecutorGroup.java:70)
        at org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.close(DatanodeHttpServer.java:249)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1863)
{noformat}
{{TestBlockTokenWithDFS}}: The datanode was restarted and having bind exception. The old port
is taken.

Test failures are not related to this patch. They are passing when run on my machine.

> Permanent write failures can happen if pipeline recoveries occur for the first packet
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-10178
>                 URL: https://issues.apache.org/jira/browse/HDFS-10178
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, HDFS-10178.v3.patch, HDFS-10178.v4.patch,
HDFS-10178.v5.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go through
properly and pipeline recovery happens. If the write op creates a pipeline, but the actual
data packet does not reach one or more datanodes in time, the pipeline recovery will be done
against the 0-byte partial block.  
> If additional datanodes are added, the block is transferred to the new nodes.  After
the transfer, each node will have a meta file containing the header and 0-length data block
file. The pipeline recovery seems to work correctly up to this point, but write fails when
actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message