Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 09589D3DA for ; Mon, 27 Aug 2012 17:26:08 +0000 (UTC) Received: (qmail 58899 invoked by uid 500); 27 Aug 2012 17:26:07 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 58841 invoked by uid 500); 27 Aug 2012 17:26:07 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 58832 invoked by uid 99); 27 Aug 2012 17:26:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Aug 2012 17:26:07 +0000 Date: Tue, 28 Aug 2012 04:26:07 +1100 (NCT) From: "Suresh Srinivas (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <592975886.2068.1346088367797.JavaMail.jiratomcat@arcas> In-Reply-To: <1359931531.90562.1343041894688.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (HDFS-3705) Add the possibility to mark a node as 'low priority' for read in the DFSClient MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442543#comment-13442543 ] Suresh Srinivas commented on HDFS-3705: --------------------------------------- My apologies for not looking at this jira for a while. The patch and the mechanism is straight forward. The one thing that made me feel uncomfortable and hence procrastinate was, this adds more client-side logic to already what is a thick client. This makes other clients such as libhdfs or client based on webhdfs lack equivalent functionality. I want to get your thoughts on the following: # We can get HDFS-3703 done for the read-side fairly quickly. This adds server-side mechanism to return datanode list where stale nodes are pushed to the end of the list. Jing is close to finishing it. It will give an effective way to deal with the stale nodes for HBase. # Alternatively, I do see value in this change. It adds a generic capability to be able to manipulate the datanode list. Should we add it right now or would HDFS-3703 be sufficient for HBase? If we really want to add it, we could make these APIs LimitedPrivate that is to be used only by the Hadoop projects (such as HBase) and make it public APIs later. I am leaning towards the first choice. Thoughts? > Add the possibility to mark a node as 'low priority' for read in the DFSClient > ------------------------------------------------------------------------------ > > Key: HDFS-3705 > URL: https://issues.apache.org/jira/browse/HDFS-3705 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client > Affects Versions: 1.0.3, 2.0.0-alpha, 3.0.0 > Reporter: nkeywal > Fix For: 3.0.0 > > Attachments: hdfs-3705.sample.patch, HDFS-3705.v1.patch > > > This has been partly discussed in HBASE-6435. > The DFSClient includes a 'bad nodes' management for reads and writes. Sometimes, the client application already know that some deads are dead or likely to be dead. > An example is the 'HBase Write-Ahead-Log': when HBase reads this file, it knows that the HBase regionserver died, and it's very likely that the box died so the datanode on the same box is dead as well. This is actually critical, because: > - it's the hbase recovery that reads these log files > - if we read them it means that we lost a box, so we have 1 dead replica out the the 3. > - for all files read, we have 33% of chance to go to the dead datanode > - as the box just died, we're very likely to get a timeout exception so we're delaying the hbase recovery by 1 minute. For HBase, it means that the data is not available during this minute. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira