Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0E9B719237 for ; Sat, 2 Apr 2016 01:28:26 +0000 (UTC) Received: (qmail 6080 invoked by uid 500); 2 Apr 2016 01:28:25 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 6025 invoked by uid 500); 2 Apr 2016 01:28:25 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 6007 invoked by uid 99); 2 Apr 2016 01:28:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Apr 2016 01:28:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9AB842C14FB for ; Sat, 2 Apr 2016 01:28:25 +0000 (UTC) Date: Sat, 2 Apr 2016 01:28:25 +0000 (UTC) From: "Arpit Agarwal (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222628#comment-15222628 ] Arpit Agarwal commented on HDFS-10178: -------------------------------------- +1 from me. Not committing it since Masatake has an open question. > Permanent write failures can happen if pipeline recoveries occur for the first packet > ------------------------------------------------------------------------------------- > > Key: HDFS-10178 > URL: https://issues.apache.org/jira/browse/HDFS-10178 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Critical > Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, HDFS-10178.v3.patch, HDFS-10178.v4.patch > > > We have observed that write fails permanently if the first packet doesn't go through properly and pipeline recovery happens. If the packet header is sent out, but the data portion of the packet does not reach one or more datanodes in time, the pipeline recovery will be done against the 0-byte partial block. > If additional datanodes are added, the block is transferred to the new nodes. After the transfer, each node will have a meta file containing the header and 0-length data block file. The pipeline recovery seems to work correctly up to this point, but write fails when actual data packet is resent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)