hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohan Pasalkar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-5711) Removing memory limitation of the Namenode by persisting Block - Block location mappings to disk.
Date Thu, 02 Jan 2014 21:00:50 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rohan Pasalkar updated HDFS-5711:
---------------------------------

    Description: 
This jira is to track changes to be made to remove HDFS name-node memory limitation to hold
block - block location mappings.

It is a known fact that the single Name-node architecture of HDFS has scalability limits.
The HDFS federation project alleviates this problem by using horizontal scaling. This helps
increase the throughput of metadata operation and also the amount of data that can be stored
in a Hadoop cluster.
The Name-node stores all the filesystem metadata in memory (even in the federated architecture),
the
Name-node design can be enhanced by persisting part of the metadata onto secondary storage
and retaining 
the popular or recently accessed metadata information in main memory. This design can benefit
a HDFS deployment
which doesn't use federation but needs to store a large number of files or large number of
blocks. Lin Xiao from Hortonworks attempted a similar
project [1] in the Summer of 2013. They used LevelDB to persist the Namespace information
(i.e file and directory inode information).

A patch with this change is yet to be submitted to code base. We also intend to use LevelDB
to persist metadata, and plan to 
provide a complete solution, by not just persisting  the Namespace information but also the
Blocks Map onto secondary storage. 

We did implement the basic prototype which stores the block-block location mapping metadata
to the persistent key-value store i.e. levelDB. Prototype also maintains the in-memory cache
of the recently used block-block location mappings metadata. 

References:
[1] Lin Xiao, Hortonworks, Removing Name-node’s memory limitation, http://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-namenodes-memory-limitation


  was:
This jira acts as an umbrella jira to track all the improvements we've done recently to improve
Namenode's performance, responsiveness, and hence scalability. Those improvements include:
1. Incremental block reports (HDFS-395)
2. BlockManager.reportDiff optimization for processing block reports (HDFS-2477)
3. Upgradable lock to allow simutaleous read operation while reportDiff is in progress in
processing block reports (HDFS-2490)
4. More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks
(HDFS-2476)
5. Increase granularity of write operations in ReplicationMonitor thus reducing contention
for write lock (HDFS-2495)
6. Support variable block sizes
7. Release RPC handlers while waiting for edit log is synced to disk
8. Reduce network traffic pressure to the master rack where NN is located by lowering read
priority of the replicas on the rack
9. A standalone KeepAlive heartbeat thread
10. Reduce Multiple traversals of path directory to one for most namespace manipulations
11. Move logging out of write lock section.




> Removing memory limitation of the Namenode by persisting Block - Block location mappings
to disk.
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5711
>                 URL: https://issues.apache.org/jira/browse/HDFS-5711
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Rohan Pasalkar
>
> This jira is to track changes to be made to remove HDFS name-node memory limitation to
hold block - block location mappings.
> It is a known fact that the single Name-node architecture of HDFS has scalability limits.
The HDFS federation project alleviates this problem by using horizontal scaling. This helps
increase the throughput of metadata operation and also the amount of data that can be stored
in a Hadoop cluster.
> The Name-node stores all the filesystem metadata in memory (even in the federated architecture),
the
> Name-node design can be enhanced by persisting part of the metadata onto secondary storage
and retaining 
> the popular or recently accessed metadata information in main memory. This design can
benefit a HDFS deployment
> which doesn't use federation but needs to store a large number of files or large number
of blocks. Lin Xiao from Hortonworks attempted a similar
> project [1] in the Summer of 2013. They used LevelDB to persist the Namespace information
(i.e file and directory inode information).
> A patch with this change is yet to be submitted to code base. We also intend to use LevelDB
to persist metadata, and plan to 
> provide a complete solution, by not just persisting  the Namespace information but also
the Blocks Map onto secondary storage. 
> We did implement the basic prototype which stores the block-block location mapping metadata
to the persistent key-value store i.e. levelDB. Prototype also maintains the in-memory cache
of the recently used block-block location mappings metadata. 
> References:
> [1] Lin Xiao, Hortonworks, Removing Name-node’s memory limitation, http://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-namenodes-memory-limitation



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message