hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haohui Mai (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-8286) Scaling out the namespace using KV store
Date Wed, 29 Apr 2015 18:42:07 GMT

     [ https://issues.apache.org/jira/browse/HDFS-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Haohui Mai updated HDFS-8286:
    Attachment: hdfs-kv-design.pdf

The attachment outlines the architecture of the HDFS namespace over KV store. It describes
how to encode the current namespace into KV schema, and how to implement existing features
such as HA and snapshot under the proposed architecture.

One thing worth noting is that in the proposed design HDFS still keeps the namespace in the
memory to smoothen the migration. What it means is that the implementation will be based on
an in-memory KV store. Our preliminary evaluations of our prototype show that the architecture
has comparable memory usage and performance w.r.t. HDFS today.

This jira can be seen as the Phase I implementation of HDFS-5389. In this jira we plan to
focus on faithfully implementing the features that are available in HDFS today, and focusing
on migrating from this architecture toward HDFS-5389 in a later phase of implementation.

> Scaling out the namespace using KV store
> ----------------------------------------
>                 Key: HDFS-8286
>                 URL: https://issues.apache.org/jira/browse/HDFS-8286
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haohui Mai
>            Assignee: Haohui Mai
>         Attachments: hdfs-kv-design.pdf
> Currently the NN keeps the namespace in the memory. To improve the scalability of the
namespace, users can scale up by using more RAM or scale out using Federation (i.e., statically
partitioning the namespace).
> We would like to remove the limitation of scaling the global namespace. Our vision is
that that HDFS should adopt a scalable underlying architecture that allows the global namespace
scales linearly.
> We propose to implement the HDFS namespace on top of a key-value (KV) store. Adopting
the KV store interfaces allows HDFS to leverage the capability of modern KV store and to become
much easier to scale. Going forward, the architecture allows distributing the namespace across
multiple machines, or  storing only the working set in the memory (HDFS-5389), both of which
allows  HDFS to manage billions of files using the commodity hardware available today.

This message was sent by Atlassian JIRA

View raw message