hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory
Date Sat, 19 Oct 2013 20:47:42 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sanjay Radia updated HDFS-5389:
-------------------------------

    Description: 
*Background:*
Currently, the NN Keeps all its namespace in memory. This has had the benefit that the NN
code is very simple and, more importantly, helps the NN scale to over 4.5K machines with 60K
 to 100K concurrently tasks.  HDFS namespace can be scaled currently using more Ram on the
NN and/or using Federation which scales both namespace and performance. The current federation
implementation does not allow renames across volumes without data copying but there are proposals
to remove that limitation.

*Motivation:*
 Hadoop lets customers store huge amounts of data at very economical prices and hence allows
customers to store their data for several years. While most customers perform analytics on
recent  data (last hour, day, week, months, quarter, year), the ability to have five year
old data online for analytics is very attractive for many businesses. Although one can use
larger RAM in a NN and/or use Federation, it not really necessary to store the entire namespace
in memory since only the recent data is typically heavily accessed. 

*Proposed Solution:*
Store a portion of the NN's namespace in memory- the "working set" of the applications that
are currently operating. LSM data structures are quite appropriate for maintaining the full
namespace in memory. One choice is Google's LevelDB open-source implementation.

*Benefits:*
 *  Store larger namespaces without resorting to Federated namespace volumes.
 * Complementary to NN Federated namespace volumes,  indeed will allow a single NN to easily
store multiple larger volumes.
 *  Faster cold startup - the NN does not have read its full namespace before responding to
clients.


  was:
Current HDFS Namenode stores all of its metadata in RAM. This has allowed Hadoop clusters
to scale to 100K concurrent tasks. However, the memory limits the total number of files that
a single NN can store. While Federation allows one to create multiple volumes with additional
Namenodes, there is a need to scale a single namespace and also to store multiple namespaces
in a single Namenode. When inodes are also stored on persistent storage, the system's boot
time can be significantly reduced because there is no need to replay edit logs. It also provides
the potential to support extended attributes once the memory size is not the bottleneck.



> A Namenode that keeps only a part of the namespace in memory
> ------------------------------------------------------------
>
>                 Key: HDFS-5389
>                 URL: https://issues.apache.org/jira/browse/HDFS-5389
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 0.23.1
>            Reporter: Lin Xiao
>            Priority: Minor
>
> *Background:*
> Currently, the NN Keeps all its namespace in memory. This has had the benefit that the
NN code is very simple and, more importantly, helps the NN scale to over 4.5K machines with
60K  to 100K concurrently tasks.  HDFS namespace can be scaled currently using more Ram on
the NN and/or using Federation which scales both namespace and performance. The current federation
implementation does not allow renames across volumes without data copying but there are proposals
to remove that limitation.
> *Motivation:*
>  Hadoop lets customers store huge amounts of data at very economical prices and hence
allows customers to store their data for several years. While most customers perform analytics
on recent  data (last hour, day, week, months, quarter, year), the ability to have five year
old data online for analytics is very attractive for many businesses. Although one can use
larger RAM in a NN and/or use Federation, it not really necessary to store the entire namespace
in memory since only the recent data is typically heavily accessed. 
> *Proposed Solution:*
> Store a portion of the NN's namespace in memory- the "working set" of the applications
that are currently operating. LSM data structures are quite appropriate for maintaining the
full namespace in memory. One choice is Google's LevelDB open-source implementation.
> *Benefits:*
>  *  Store larger namespaces without resorting to Federated namespace volumes.
>  * Complementary to NN Federated namespace volumes,  indeed will allow a single NN to
easily store multiple larger volumes.
>  *  Faster cold startup - the NN does not have read its full namespace before responding
to clients.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message