hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5324) Make Namespace implementation pluggable in the namenode
Date Tue, 08 Oct 2013 17:16:43 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789404#comment-13789404
] 

Milind Bhandarkar commented on HDFS-5324:
-----------------------------------------

Most of the work done so far, has been on 2.x branch. We are in the process of re-branching
it to trunk, and testing it. Soon, we will post that patch.

> Make Namespace implementation pluggable in the namenode
> -------------------------------------------------------
>
>                 Key: HDFS-5324
>                 URL: https://issues.apache.org/jira/browse/HDFS-5324
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.1.1-beta
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Milind Bhandarkar
>             Fix For: 3.0.0
>
>
> [For more details on the proposal, and feedback from the community, please refer to the
discussions on hdfs-dev mailing list: http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201310.mbox/%3CCAAQrVk4NKUtLh36pdFQmS77n%2Bfe49wcy5-DoQW7EZMPLSJPzmQ%40mail.gmail.com%3E]
> Exec Summary: For the last couple of months, we have been working on making Namespace
> implementation in the namenode pluggable. We have demonstrated that it can
> be done without major surgery on the namenode, and does not have noticeable
> performance impact. We would like to contribute it back to Apache HDFS.
> Rationale:
> In a Hadoop cluster, Namenode roughly has following main responsibilities.
> • Catering to RPC calls from clients.
> • Managing the HDFS namespace tree.
> • Managing block report, heartbeat and other communication from data nodes.
> For Hadoop clusters having large number of files and large number of nodes,
> name node gets bottlenecked. Mainly for two reasons
> • All the information is kept in name node’s main memory.
> • Namenode has to cater to all the request from clients / data nodes.
> • And also perform some operations for backup and check pointing node.
> A possible solution is to add more main memory but there are certain issues
> with this approach
> • Namenode being Java application, garbage collection cycles execute
> periodically to reclaim unreferenced heap space. When the heap space grows
> very large, despite of GC policy  chosen, application stalls during the GC
> activity. This creates a bunch of issues since DNs and  clients may
> perceive this stall as NN crash.
> • There will always be a practical limit on how much physical memory a
> single machine can accommodate.
> Proposed Solution:
> Out of the three responsibilities listed above, we can refactor namespace
> management from the namenode codebase in such a way that there is provision
> to implement and plug other name systems other than existing in-process
> memory-based name system. Particularly a name system backed by a
> distributed key-value store will significantly reduce namenode memory
> requirement.To achieve this, a new generic interface will be introduced
> [Let’s call it AbstractNameSystem] which defines set of operations using
> which we perform the namespace management. Namenode code that used to
> manipulate some java objects maintained in namenode’s heap will now operate
> on this interface. There will be provision for others to extend this
> interface and plug their own NameSystem implementation.
> To get started, we have implemented the same memory-based namespace
> implementation in a remote process, outside of the namenode JVM. In
> addition, work is undergoing to implement the namesystem using HBase.
> Details of Changes:
> Created new class called AbstractNamesystem, existing FSNamesystem is a
> subclass of this class. Some code from FSNamesystem has been moved to its
> parent. Created a Factory class to create object of NS management
> class.Factory refers to newly added config properties to support pluggable
> name space management class. Added unit tests for Factory. Replaced
> constructors with factory calls, this is  because the namesystem instances
> should now be created based on configuration. Added new config properties
> to support pluggable name space management class. This property will decide
> which Namesystem class will be instantiated by the factory. This change is
> also reflected in some DFS related webapps [JSP files] where namesystem
> instance is used to obtain DFS health and other stats.
> These changes aim to make the namesystem pluggable without changing high
> level interfaces, this is particularly tricky since memory-based name
> system functionality is currently baked into these interfaces, and ultimate
> goal is to make the high level interface free from memory-based name system.
> Consideration for Upgrade and Rollback:
> Current memory based implementation already has code to read from and write
> to fsimage , we will have to make them publicly accessible which will
> enable us to upgrade an existing cluster from FSNamespace to newly added
> name system in future version.
> a. Upgrades: By making use of existing Loader class for reading fsimage we
> can write some code load this image into the future name system
> implementation.
> b. Rollback: Are even simpler, we can preserve the old fsimage and start
> the cluster with that image by configuring the cluster to use current file
> system based name system.
> Future work
> Current HDFS design is such that FSNameSystem is baked into even high level
> interfaces, this is a major hurdle in cleanly implementing pluggable name
> systems. We aim to propose a change in such interfaces into which
> FSNameSystem is tightly coupled.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message