Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3169B10C57 for ; Tue, 22 Oct 2013 17:54:48 +0000 (UTC) Received: (qmail 95832 invoked by uid 500); 22 Oct 2013 17:54:44 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 95777 invoked by uid 500); 22 Oct 2013 17:54:44 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 95766 invoked by uid 99); 22 Oct 2013 17:54:43 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Oct 2013 17:54:43 +0000 Date: Tue, 22 Oct 2013 17:54:43 +0000 (UTC) From: "Daryn Sharp (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802062#comment-13802062 ] Daryn Sharp commented on HDFS-5389: ----------------------------------- I've actually been working on finer grain locking. > A Namenode that keeps only a part of the namespace in memory > ------------------------------------------------------------ > > Key: HDFS-5389 > URL: https://issues.apache.org/jira/browse/HDFS-5389 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 0.23.1 > Reporter: Lin Xiao > Priority: Minor > > *Background:* > Currently, the NN Keeps all its namespace in memory. This has had the benefit that the NN code is very simple and, more importantly, helps the NN scale to over 4.5K machines with 60K to 100K concurrently tasks. HDFS namespace can be scaled currently using more Ram on the NN and/or using Federation which scales both namespace and performance. The current federation implementation does not allow renames across volumes without data copying but there are proposals to remove that limitation. > *Motivation:* > Hadoop lets customers store huge amounts of data at very economical prices and hence allows customers to store their data for several years. While most customers perform analytics on recent data (last hour, day, week, months, quarter, year), the ability to have five year old data online for analytics is very attractive for many businesses. Although one can use larger RAM in a NN and/or use Federation, it not really necessary to store the entire namespace in memory since only the recent data is typically heavily accessed. > *Proposed Solution:* > Store a portion of the NN's namespace in memory- the "working set" of the applications that are currently operating. LSM data structures are quite appropriate for maintaining the full namespace in memory. One choice is Google's LevelDB open-source implementation. > *Benefits:* > * Store larger namespaces without resorting to Federated namespace volumes. > * Complementary to NN Federated namespace volumes, indeed will allow a single NN to easily store multiple larger volumes. > * Faster cold startup - the NN does not have read its full namespace before responding to clients. -- This message was sent by Atlassian JIRA (v6.1#6144)