hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10419) Building HDFS on top of new storage layer (HDSL)
Date Fri, 16 Mar 2018 23:32:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403116#comment-16403116
] 

Sanjay Radia commented on HDFS-10419:
-------------------------------------

In the " [VOTE] Merging branch HDFS-7240 to trunk" thread [~andrew.wang] asked:
{quote}*Sanjay says*:
 >- NN on top HDSL where the NN uses the new block layer (Both Daryn and Owen acknowledge
the >benefit of the >>new block layer).  We have two choices here

>** a) Evolve NN so that it can interact with both old and new block layer,

 >**  b) Fork and create new NN that works only with new block layer, the old NN will
continue to work with old >>block layer.

>There are trade-offs but clearly the 2nd option has least impact on the old HDFS code.

*Andrew asks*: Are you proposing that we pursue the 2nd option to integrate HDSL with HDFS?
{quote}
Originally I would have preferred (a), but Owen made a strong case for (b) in my discussions
with his last week. I believe approach (a) or (b) will depend strongly on what we want to
do. For example if we do milestone-1 and get the 2x scalability and decide to stop there then
clearly go with option (a) - it will require little refactoring and one can run old and new
HDFS side-by-side. If you are planning to follow up milestone-1 with say the caching the working
set of the namespace, then forking the NN code (ie option b) might be better, and the new
NN will have to keep pulling over features and bug fixes from the old NN.. Konstantine has
proposed  other alternatives and we would  evaluate (a) or (b) for his alternative.  I
am not locked into any particular path or how we would do it.

 

> Building HDFS on top of new storage layer (HDSL)
> ------------------------------------------------
>
>                 Key: HDFS-10419
>                 URL: https://issues.apache.org/jira/browse/HDFS-10419
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>            Priority: Major
>         Attachments: Evolving NN using new block-container layer.pdf
>
>
> In HDFS-7240, Ozone defines storage containers to store both the data and the metadata.
The storage container layer provides an object storage interface and aims to manage data/metadata
in a distributed manner. More details about storage containers can be found in the design
doc in HDFS-7240.
> HDFS can adopt the storage containers to store and manage blocks. The general idea is:
> # Each block can be treated as an object and the block ID is the object's key.
> # Blocks will still be stored in DataNodes but as objects in storage containers.
> # The block management work can be separated out of the NameNode and will be handled
by the storage container layer in a more distributed way. The NameNode will only manage the
namespace (i.e., files and directories).
> # For each file, the NameNode only needs to record a list of block IDs which are used
as keys to obtain real data from storage containers.
> # A new DFSClient implementation talks to both NameNode and the storage container layer
to read/write.
> HDFS, especially the NameNode, can get much better scalability from this design. Currently
the NameNode's heaviest workload comes from the block management, which includes maintaining
the block-DataNode mapping, receiving full/incremental block reports, tracking block states
(under/over/miss replicated), and joining every writing pipeline protocol to guarantee the
data consistency. These work bring high memory footprint and make NameNode suffer from GC.
HDFS-5477 already proposes to convert BlockManager as a service. If we can build HDFS on top
of the storage container layer, we not only separate out the BlockManager from the NameNode,
but also replace it with a new distributed management scheme.
> The storage container work is currently in progress in HDFS-7240, and the work proposed
here is still in an experimental/exploring stage. We can do this experiment in a feature branch
so that people with interests can be involved.
> A design doc will be uploaded later explaining more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message