Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A732B23C for ; Sat, 21 Jan 2012 00:35:04 +0000 (UTC) Received: (qmail 27250 invoked by uid 500); 21 Jan 2012 00:35:04 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 27190 invoked by uid 500); 21 Jan 2012 00:35:03 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 27180 invoked by uid 99); 21 Jan 2012 00:35:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Jan 2012 00:35:03 +0000 X-ASF-Spam-Status: No, hits=-1996.4 required=5.0 tests=ALL_TRUSTED,FS_REPLICA,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Jan 2012 00:35:00 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 05518158A87 for ; Sat, 21 Jan 2012 00:34:40 +0000 (UTC) Date: Sat, 21 Jan 2012 00:34:40 +0000 (UTC) From: "Todd Lipcon (Updated) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1956709397.62266.1327106080023.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <778092553.17314.1323989010623.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HDFS-2691) HA: Tests and fixes for pipeline targets and replica recovery MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2691: ------------------------------ Attachment: hdfs-2691.txt Attached patch is what I've been testing with on a cluster with HBase for a little while. The approach is to send RBW replicas as part of the "block received and deleted" reports. There are a couple potential optimizations we could do here: 1) only do this when HA is enabled? 2) change the client so that when it hflushed, it sends a flag to the DN which causes it to report a RBW replica (so this only happens for blocks getting hsynced/hflushed) 3) only send these reports when a failover is detected (as discussed above) Would really appreciate feedback on the correct design here. I also plan to continue testing this - there's still some weirdness where RBW replicas show up as "corrupt" for a short while after a failover, but then seem to fix themselves with no further effort - maybe just a metrics thing. > HA: Tests and fixes for pipeline targets and replica recovery > ------------------------------------------------------------- > > Key: HDFS-2691 > URL: https://issues.apache.org/jira/browse/HDFS-2691 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha > Affects Versions: HA branch (HDFS-1623) > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Priority: Critical > Attachments: hdfs-2691.txt, hdfs-2691.txt > > > Currently there are some TODOs around pipeline/recovery code in the HA branch. For example, commitBlockSynchronization only gets sent to the active NN which may have failed over by that point. So, we need to write some tests here and figure out what the correct behavior is. > Another related area is the treatment of targets in the pipeline. When a pipeline is created, the active NN adds the "expected locations" to the BlockInfoUnderConstruction, but the DN identifiers aren't logged with the OP_ADD. So after a failover, the BlockInfoUnderConstruction will have no targets and I imagine replica recovery would probably trigger some issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira