Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Thu, 2 Jan 2014 21:00:50 +0000 (UTC)
From: "Rohan Pasalkar (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12687037.1388695566668.33410.1388696450881@arcas>
In-Reply-To: <JIRA.12687037.1388695566668@arcas>
References: <JIRA.12687037.1388695566668@arcas>
Subject: [jira] [Updated] (HDFS-5711) Removing memory limitation of the
 Namenode by persisting Block - Block location mappings to disk.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


     [ https://issues.apache.org/jira/browse/HDFS-5711?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohan Pasalkar updated HDFS-5711:
---------------------------------

    Description:=20
This jira is to track changes to be made to remove HDFS name-node memory li=
mitation to hold block - block location mappings.

It is a known fact that the single Name-node architecture of HDFS has scala=
bility limits. The HDFS federation project alleviates this problem by using=
 horizontal scaling. This helps increase the throughput of metadata operati=
on and also the amount of data that can be stored in a Hadoop cluster.
The Name-node stores all the filesystem metadata in memory (even in the fed=
erated architecture), the
Name-node design can be enhanced by persisting part of the metadata onto se=
condary storage and retaining=20
the popular or recently accessed metadata information in main memory. This =
design can benefit a HDFS deployment
which doesn't use federation but needs to store a large number of files or =
large number of blocks. Lin Xiao from Hortonworks attempted a similar
project [1] in the Summer of 2013. They used LevelDB to persist the Namespa=
ce information (i.e file and directory inode information).

A patch with this change is yet to be submitted to code base. We also inten=
d to use LevelDB to persist metadata, and plan to=20
provide a complete solution, by not just persisting  the Namespace informat=
ion but also the Blocks Map onto secondary storage.=20

We did implement the basic prototype which stores the block-block location =
mapping metadata to the persistent key-value store i.e. levelDB. Prototype =
also maintains the in-memory cache of the recently used block-block locatio=
n mappings metadata.=20

References:
[1] Lin Xiao, Hortonworks, Removing Name-node=E2=80=99s memory limitation, =
http://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-na=
menodes-memory-limitation


  was:
This jira acts as an umbrella jira to track all the improvements we've done=
 recently to improve Namenode's performance, responsiveness, and hence scal=
ability. Those improvements include:
1. Incremental block reports (HDFS-395)
2. BlockManager.reportDiff optimization for processing block reports (HDFS-=
2477)
3. Upgradable lock to allow simutaleous read operation while reportDiff is =
in progress in processing block reports (HDFS-2490)
4. More CPU efficient data structure for under-replicated/over-replicated/i=
nvalidate blocks (HDFS-2476)
5. Increase granularity of write operations in ReplicationMonitor thus redu=
cing contention for write lock (HDFS-2495)
6. Support variable block sizes
7. Release RPC handlers while waiting for edit log is synced to disk
8. Reduce network traffic pressure to the master rack where NN is located b=
y lowering read priority of the replicas on the rack
9. A standalone KeepAlive heartbeat thread
10. Reduce Multiple traversals of path directory to one for most namespace =
manipulations
11. Move logging out of write lock section.


> Removing memory limitation of the Namenode by persisting Block - Block lo=
cation mappings to disk.
> -------------------------------------------------------------------------=
------------------------
>
>                 Key: HDFS-5711
>                 URL: https://issues.apache.org/jira/browse/HDFS-5711
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Rohan Pasalkar
>
> This jira is to track changes to be made to remove HDFS name-node memory =
limitation to hold block - block location mappings.
> It is a known fact that the single Name-node architecture of HDFS has sca=
lability limits. The HDFS federation project alleviates this problem by usi=
ng horizontal scaling. This helps increase the throughput of metadata opera=
tion and also the amount of data that can be stored in a Hadoop cluster.
> The Name-node stores all the filesystem metadata in memory (even in the f=
ederated architecture), the
> Name-node design can be enhanced by persisting part of the metadata onto =
secondary storage and retaining=20
> the popular or recently accessed metadata information in main memory. Thi=
s design can benefit a HDFS deployment
> which doesn't use federation but needs to store a large number of files o=
r large number of blocks. Lin Xiao from Hortonworks attempted a similar
> project [1] in the Summer of 2013. They used LevelDB to persist the Names=
pace information (i.e file and directory inode information).
> A patch with this change is yet to be submitted to code base. We also int=
end to use LevelDB to persist metadata, and plan to=20
> provide a complete solution, by not just persisting  the Namespace inform=
ation but also the Blocks Map onto secondary storage.=20
> We did implement the basic prototype which stores the block-block locatio=
n mapping metadata to the persistent key-value store i.e. levelDB. Prototyp=
e also maintains the in-memory cache of the recently used block-block locat=
ion mappings metadata.=20
> References:
> [1] Lin Xiao, Hortonworks, Removing Name-node=E2=80=99s memory limitation=
, http://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-=
namenodes-memory-limitation


--
This message was sent by Atlassian JIRA
(v6.1.5#6160)