hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jitendra Pandey <jiten...@hortonworks.com>
Subject Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk
Date Mon, 30 Oct 2017 18:58:01 GMT
Hi Konstantin,
 Thank you for taking out time to review ozone. I appreciate your comments and questions.

> There are two main limitations in HDFS
> a) The throughput of Namespace operations. Which is limited by the 
>    number of RPCs the NameNode can handle
> b) The number of objects (files + blocks) the system can maintain. 
>    Which is limited by the memory size of the NameNode.

   I agree completely. We believe ozone attempts to address both these issues for HDFS.
   
   Let us look at the Number of objects problem. Ozone directly addresses the scalability
of number of blocks by introducing storage containers that can hold multiple blocks together.
The earlier efforts on this were complicated by the fact that block manager and namespace
are intertwined in HDFS Namenode. There have been efforts in past to separate block manager
from namespace for e.g. HDFS-5477. Ozone addresses this problem by cleanly separating the
block layer.  Separation of block layer also addresses the file/directories scalability because
it frees up the blockmap from the namenode.
   
   Separate block layer relieves namenode from handling block reports, IBRs, heartbeats, replication
monitor etc, and thus reduces the contention on FSNamesystem lock and significantly reduces
the GC pressure on the namenode. These improvements will greatly help the RPC performance
of the Namenode.

> Ozone is probably just the first step in rebuilding HDFS under a new
> architecture. With the next steps presumably being HDFS-10419 and 
> HDFS-11118. The design doc for the new architecture has never been 
> published.

  We do believe that Namenode can leverage the ozone’s storage container layer, however,
that is also a big effort. We would like to first have block layer stabilized in ozone before
taking that up. However, we would certainly support any community effort on that, and in fact
it was brought up in last BoF session at the summit.

  Big data is evolving rapidly. We see our customers needing scalable file systems, Objects
stores(like S3) and Block Store(for docker and VMs). Ozone improves HDFS in two ways. It addresses
throughput and scale issues of HDFS, and enriches it with newer capabilities.


> Ozone is a big enough system to deserve its own project.

I took a quick look at the core code in ozone and the cloc command reports 22,511 lines of
functionality changes in Java.

This patch also brings in web framework code like Angular.js and that brings in bunch of css
and js files that contribute to the size of the patch, and the rest are test and documentation
changes.

I hope this addresses your concerns.

Best regards,
jitendra

On 10/28/17, 2:00 PM, "Konstantin Shvachko" <shv.hadoop@gmail.com> wrote:

    Hey guys,
    
    It is an interesting question whether Ozone should be a part of Hadoop.
    There are two main reasons why I think it should not.
    
    1. With close to 500 sub-tasks, with 6 MB of code changes, and with a
    sizable community behind, it looks to me like a whole new project.
    It is essentially a new storage system, with different (than HDFS)
    architecture, separate S3-like APIs. This is really great - the World sure
    needs more distributed file systems. But it is not clear why Ozone should
    co-exist with HDFS under the same roof.
    
    2. Ozone is probably just the first step in rebuilding HDFS under a new
    architecture. With the next steps presumably being HDFS-10419 and
    HDFS-11118.
    The design doc for the new architecture has never been published. I can
    only assume based on some presentations and personal communications that
    the idea is to use Ozone as a block storage, and re-implement NameNode, so
    that it stores only a partial namesapce in memory, while the bulk of it
    (cold data) is persisted to a local storage.
    Such architecture makes me wonder if it solves Hadoop's main problems.
    There are two main limitations in HDFS:
      a. The throughput of Namespace operations. Which is limited by the number
    of RPCs the NameNode can handle
      b. The number of objects (files + blocks) the system can maintain. Which
    is limited by the memory size of the NameNode.
    The RPC performance (a) is more important for Hadoop scalability than the
    object count (b). The read RPCs being the main priority.
    The new architecture targets the object count problem, but in the expense
    of the RPC throughput. Which seems to be a wrong resolution of the tradeoff.
    Also based on the use patterns on our large clusters we read up to 90% of
    the data we write, so cold data is a small fraction and most of it must be
    cached.
    
    To summarize:
    - Ozone is a big enough system to deserve its own project.
    - The architecture that Ozone leads to does not seem to solve the intrinsic
    problems of current HDFS.
    
    I will post my opinion in the Ozone jira. Should be more convenient to
    discuss it there for further reference.
    
    Thanks,
    --Konstantin
    
    
    
    On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <cheersyang@hotmail.com> wrote:
    
    > Hello everyone,
    >
    >
    > I would like to start this thread to discuss merging Ozone (HDFS-7240) to
    > trunk. This feature implements an object store which can co-exist with
    > HDFS. Ozone is disabled by default. We have tested Ozone with cluster sizes
    > varying from 1 to 100 data nodes.
    >
    >
    >
    > The merge payload includes the following:
    >
    >   1.  All services, management scripts
    >   2.  Object store APIs, exposed via both REST and RPC
    >   3.  Master service UIs, command line interfaces
    >   4.  Pluggable pipeline Integration
    >   5.  Ozone File System (Hadoop compatible file system implementation,
    > passes all FileSystem contract tests)
    >   6.  Corona - a load generator for Ozone.
    >   7.  Essential documentation added to Hadoop site.
    >   8.  Version specific Ozone Documentation, accessible via service UI.
    >   9.  Docker support for ozone, which enables faster development cycles.
    >
    >
    > To build Ozone and run ozone using docker, please follow instructions in
    > this wiki page. https://cwiki.apache.org/confl
    > uence/display/HADOOP/Dev+cluster+with+docker.
    >
    >
    > We have built a passionate and diverse community to drive this feature
    > development. As a team, we have achieved significant progress in past 3
    > years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we
    > have resolved almost 400 JIRAs by 20+ contributors/committers from
    > different countries and affiliations. We also want to thank the large
    > number of community members who were supportive of our efforts and
    > contributed ideas and participated in the design of ozone.
    >
    >
    > Please share your thoughts, thanks!
    >
    >
    > -- Weiwei Yang
    >
    
    
    On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <cheersyang@hotmail.com> wrote:
    
    > Hello everyone,
    >
    >
    > I would like to start this thread to discuss merging Ozone (HDFS-7240) to
    > trunk. This feature implements an object store which can co-exist with
    > HDFS. Ozone is disabled by default. We have tested Ozone with cluster sizes
    > varying from 1 to 100 data nodes.
    >
    >
    >
    > The merge payload includes the following:
    >
    >   1.  All services, management scripts
    >   2.  Object store APIs, exposed via both REST and RPC
    >   3.  Master service UIs, command line interfaces
    >   4.  Pluggable pipeline Integration
    >   5.  Ozone File System (Hadoop compatible file system implementation,
    > passes all FileSystem contract tests)
    >   6.  Corona - a load generator for Ozone.
    >   7.  Essential documentation added to Hadoop site.
    >   8.  Version specific Ozone Documentation, accessible via service UI.
    >   9.  Docker support for ozone, which enables faster development cycles.
    >
    >
    > To build Ozone and run ozone using docker, please follow instructions in
    > this wiki page. https://cwiki.apache.org/confluence/display/HADOOP/Dev+
    > cluster+with+docker.
    >
    >
    > We have built a passionate and diverse community to drive this feature
    > development. As a team, we have achieved significant progress in past 3
    > years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we
    > have resolved almost 400 JIRAs by 20+ contributors/committers from
    > different countries and affiliations. We also want to thank the large
    > number of community members who were supportive of our efforts and
    > contributed ideas and participated in the design of ozone.
    >
    >
    > Please share your thoughts, thanks!
    >
    >
    > -- Weiwei Yang
    >
    

Mime
View raw message