Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A7E12D1A4 for ; Wed, 28 Nov 2012 00:54:58 +0000 (UTC) Received: (qmail 33571 invoked by uid 500); 28 Nov 2012 00:54:58 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 33534 invoked by uid 500); 28 Nov 2012 00:54:58 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 33522 invoked by uid 99); 28 Nov 2012 00:54:58 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2012 00:54:58 +0000 Date: Wed, 28 Nov 2012 00:54:58 +0000 (UTC) From: "Suresh Srinivas (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1121056702.30533.1354064098366.JavaMail.jiratomcat@arcas> In-Reply-To: <64203695.18862.1346367007675.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505119#comment-13505119 ] Suresh Srinivas commented on HDFS-3875: --------------------------------------- bq. while it's a nasty corruption issue, I don't think it's anything new... I think its a good idea to keep this as blocker even if this issue is not a new one, given it is a corruption issue. Nicholas, any comments on if this applies to old pipeline vs new pipeline? > Issue handling checksum errors in write pipeline > ------------------------------------------------ > > Key: HDFS-3875 > URL: https://issues.apache.org/jira/browse/HDFS-3875 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client > Affects Versions: 2.0.2-alpha > Reporter: Todd Lipcon > Assignee: Kihwal Lee > Priority: Blocker > > We saw this issue with one block in a large test cluster. The client is storing the data with replication level 2, and we saw the following: > - the second node in the pipeline detects a checksum error on the data it received from the first node. We don't know if the client sent a bad checksum, or if it got corrupted between node 1 and node 2 in the pipeline. > - this caused the second node to get kicked out of the pipeline, since it threw an exception. The pipeline started up again with only one replica (the first node in the pipeline) > - this replica was later determined to be corrupt by the block scanner, and unrecoverable since it is the only replica -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira