hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haohui Mai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8286) Scaling out the namespace using KV store
Date Mon, 11 May 2015 23:38:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538884#comment-14538884

Haohui Mai commented on HDFS-8286:

Sorry for the late reply. 

bq. what is the exact goal of this jira?

This jira proposes to implement HDFS on top of an KV interface. This is a prerequisite step
to evolve HDFS towards several directions, such as storing the namespace into a LSM, or distributing
the namespace. Deciding which direction to go is out of the scope of this jira.

bq. You probably want a support for a more generic notion of a Key.

Agree. I want to highlight that it is important to choose a encoding scheme that matches the
underlying characteristics of KV store for best efficiency. For example, the described scheme
focuses on reducing the memory footprint while our prototype on top of LevelDB relies on the
locality of the KV pairs for best efficiency.

As a result we make the encoding scheme pluggable. There is a a thin layer that translates
the primitives (e.g., {{getChild()}}, {{listChildren()}} into KV operations. Different KV
stores should plug in their encoding scheme.

bq. What motivates the choice of levelDB?

It serves the purposes of prototyping well when scaling the namespace beyond a single NN's
heap. Note that there are several technical roadblocks need to be overcome in order to implement
HDFS on top of either a LSM or a distributed KV. For example, there should be no I/O operations
happen inside the global lock we have today.

Again the purposes of this jira is to implement the HDFS namespace on top of a KV store to
provide a common ground, and to identify and clear these road blocks to allow HDFS to scale
beyond the heap of a single NN. The design aims to work well with both LSM-based or distributed
KV stores. As the encoding scheme is pluggable, we defer the discussions on choosing a particular
KV store implementation.

bq. It is not clear what is proposed for a distributed namespace, if anything?

We have not explored this area yet. Your experience of Giraffa is highly appreciated.

> Scaling out the namespace using KV store
> ----------------------------------------
>                 Key: HDFS-8286
>                 URL: https://issues.apache.org/jira/browse/HDFS-8286
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haohui Mai
>            Assignee: Haohui Mai
>         Attachments: hdfs-kv-design.pdf
> Currently the NN keeps the namespace in the memory. To improve the scalability of the
namespace, users can scale up by using more RAM or scale out using Federation (i.e., statically
partitioning the namespace).
> We would like to remove the limitation of scaling the global namespace. Our vision is
that that HDFS should adopt a scalable underlying architecture that allows the global namespace
scales linearly.
> We propose to implement the HDFS namespace on top of a key-value (KV) store. Adopting
the KV store interfaces allows HDFS to leverage the capability of modern KV store and to become
much easier to scale. Going forward, the architecture allows distributing the namespace across
multiple machines, or  storing only the working set in the memory (HDFS-5389), both of which
allows  HDFS to manage billions of files using the commodity hardware available today.

This message was sent by Atlassian JIRA

View raw message