hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sanjay Radia <sanjayo...@gmail.com>
Subject Re: [VOTE] Merging branch HDFS-7240 to trunk
Date Mon, 05 Mar 2018 07:59:19 GMT
  Thanks for your response. 

 In this email let me focus on maintenance and unnecessary impact on HDFS.
Daryn also touched on this topic and looked at the code base from the developer impact point
of view. He appreciated that the code is separate and I agree with his suggestion to move
it further up the src tree (e.g. Hadoop-hdsl-project or hadoop-hdfs-project/hadoop-hdsl).
He also gave a good analogy to the store: do not break things as you change and evolve the
store. Let’s look at the areas of future interaction as examples.

- NN on top HDSL where the NN uses the new block layer (Both Daryn and Owen acknowledge the
benefit of the new block layer).  We have two choices here 
 ** a) Evolve NN so that it can interact with both old and new block layer, 
 **  b) Fork and create new NN that works only with new block layer, the old NN will continue
to work with old block layer. 
There are trade-offs but clearly the 2nd option has least impact on the old HDFS code.  

- Share the HDSL’s netty  protocol engine with HDFS block layer.  After HDSL and Ozone has
stabilized the engine, put the new netty engine in either HDFS or in Hadoop common - HDSL
will use it from there. The HDFS community  has been talking about moving to better thread
model for HDFS DNs since release 0.16!!

- Shallow copy. Here HDSL needs a way to get the actual linux file system links - HDFS block
layer needs  to provide a private secure API to get file names of blocks so that HDSL can
do a hard link (hence shallow copy)o

The first 2 examples are beneficial to existing HDFS and the maintenance burden can be minimized
and worth the benefits (2x NN scalability!! And more efficient protocol engine). The 3rd is
only beneficial to HDFS users who want the scalability of the new HDSL/Ozone code in a side-by-side
system; here the cost is providing a  private API to access the block file name. 


> On Mar 1, 2018, at 11:03 PM, Andrew Wang <andrew.wang@cloudera.com> wrote:
> Hi Sanjay,
> I have different opinions about what's important and how to eventually
> integrate this code, and that's not because I'm "conveniently ignoring"
> your responses. I'm also not making some of the arguments you claim I am
> making. Attacking arguments I'm not making is not going to change my mind,
> so let's bring it back to the arguments I am making.
> Here's what it comes down to: HDFS-on-HDSL is not going to be ready in the
> near-term, and it comes with a maintenance cost.
> I did read the proposal on HDFS-10419 and I understood that HDFS-on-HDSL
> integration does not necessarily require a lock split. However, there still
> needs to be refactoring to clearly define the FSN and BM interfaces and
> make the BM pluggable so HDSL can be swapped in. This is a major
> undertaking and risky. We did a similar refactoring in 2.x which made
> backports hard and introduced bugs. I don't think we should have done this
> in a minor release.
> Furthermore, I don't know what your expectation is on how long it will take
> to stabilize HDSL, but this horizon for other storage systems is typically
> measured in years rather than months.
> Both of these feel like Hadoop 4 items: a ways out yet.
> Moving on, there is a non-trivial maintenance cost to having this new code
> in the code base. Ozone bugs become our bugs. Ozone dependencies become our
> dependencies. Ozone's security flaws are our security flaws. All of this
> negatively affects our already lumbering release schedule, and thus our
> ability to deliver and iterate on the features we're already trying to
> ship. Even if Ozone is separate and off by default, this is still a large
> amount of code that comes with a large maintenance cost. I don't want to
> incur this cost when the benefit is still a ways out.
> We disagree on the necessity of sharing a repo and sharing operational
> behaviors. Libraries exist as a method for sharing code. HDFS also hardly
> has a monopoly on intermediating storage today. Disks are shared with MR
> shuffle, Spark/Impala spill, log output, Kudu, Kafka, etc. Operationally
> we've made this work. Having Ozone/HDSL in a separate process can even be
> seen as an operational advantage since it's isolated. I firmly believe that
> we can solve any implementation issues even with separate processes.
> This is why I asked about making this a separate project. Given that these
> two efforts (HDSL stabilization and NN refactoring) are a ways out, the
> best way to get Ozone/HDSL in the hands of users today is to release it as
> its own project. Owen mentioned making a Hadoop subproject; we'd have to
> hash out what exactly this means (I assume a separate repo still managed by
> the Hadoop project), but I think we could make this work if it's more
> attractive than incubation or a new TLP.
> I'm excited about the possibilities of both HDSL and the NN refactoring in
> ensuring a future for HDFS for years to come. A pluggable block manager
> would also let us experiment with things like HDFS-on-S3, increasingly
> important in a cloud-centric world. CBlock would bring HDFS to new usecases
> around generic container workloads. However, given the timeline for
> completing these efforts, now is not the time to merge.
> Best,
> Andrew
> On Thu, Mar 1, 2018 at 5:33 PM, Daryn Sharp <daryn@oath.com.invalid> wrote:
>> I’m generally neutral and looked foremost at developer impact.  Ie.  Will
>> it be so intertwined with hdfs that each project risks destabilizing the
>> other?  Will developers with no expertise in ozone will be impeded?  I
>> think the answer is currently no.  These are the intersections and some
>> concerns based on the assumption ozone is accepted into the project:
>> Common
>> Appear to be a number of superfluous changes.  The conf servlet must not be
>> polluted with specific references and logic for ozone.  We don’t create
>> dependencies from common to hdfs, mapred, yarn, hive, etc.  Common must be
>> “ozone free”.
>> Datanode
>> I expected ozone changes to be intricately linked with the existing blocks
>> map, dataset, volume, etc.  Thankfully it’s not.  As an independent
>> service, the DN should not be polluted with specific references to ozone.
>> If ozone is in the project, the DN should have a generic plugin interface
>> conceptually similar to the NM aux services.
>> Namenode
>> No impact, currently, but certainly will be…
>> Code Location
>> I don’t feel hadoop-hdfs-project/hadoop-hdfs is an acceptable location.
>> I’d rather see hadoop-hdfs-project/hadoop-hdsl, or even better
>> hadoop-hdsl-project.  This clean separation will make it easier to later
>> spin off or pull in depending on which way we vote.
>> Dependencies
>> Owen hit upon his before I could send.  Hadoop is already bursting with
>> dependencies, I hope this doesn’t pull in a lot more.
>> ––
>> Do I think ozone be should be a separate project?  If we view it only as a
>> competing filesystem, then clearly yes.  If it’s a low risk evolutionary
>> step with near-term benefits, no, we want to keep it close and help it
>> evolve.  I think ozone/hdsl/whatever has been poorly marketed and an
>> umbrella term for too many technologies that should perhaps be split.  I'm
>> interested in the container block management.  I have little interest at
>> this time in the key store.
>> The usability of ozone, specifically container management, is unclear to
>> me.  It lacks basic features like changing replication factors, append, a
>> migration path, security, etc - I know there are good plans for all of it -
>> yet another goal is splicing into the NN.  That’s a lot of high priority
>> items to tackle that need to be carefully orchestrated before contemplating
>> BM replacement.  Each of those is a non-starter for (my) production
>> environment.  We need to make sure we can reach a consensus on the block
>> level functionality before rushing it into the NN.  That’s independent of
>> whether allowing it into the project.
>> The BM/SCM changes to the NN are realistically going to be contentious &
>> destabilizing.  If done correctly, the BM separation will be a big win for
>> the NN.  If ozone is out, by necessity interfaces will need to be stable
>> and well-defined but we won’t get that right for a long time.  Interface
>> and logic changes that break the other will be difficult to coordinate and
>> we’ll likely veto changes that impact the other.  If ozone is in, we can
>> hopefully synchronize the changes with less friction, but it greatly
>> increases the chances of developers riddling the NN with hacks and/or ozone
>> specific logic that makes it even more brittle.  I will note we need to be
>> vigilant against pervasive conditionals (ie. EC, snapshots).
>> In either case, I think ozone must agree to not impede current hdfs work.
>> I’ll compare to hdfs is a store owner that plans to maybe retire in 5
>> years.  A potential new owner (ozone) is lined up and hdfs graciously gives
>> them no-rent space (the DN).  Precondition is help improve the store.
>> Don’t make a mess and expect hdfs to clean it up.  Don’t make renovations
>> that complicate hdfs but ignore it due to anticipation of its
>> departure/demise.  I’m not implying that’s currently happening, it’s just
>> what I don’t want to see.
>> We as a community and our customers need an evolution, not a revolution,
>> and definitively not a civil war.  Hdfs has too much legacy code rot that
>> is hard to change.  Too many poorly implemented features.   Perhaps I’m
>> overly optimistic that freshly redesigned code can counterbalance
>> performance degradations in the NN.  I’m also reluctant, but realize it is
>> being driven by some hdfs veterans that know/understand historical hdfs
>> design strengths and flaws.
>> If the initially cited issues are addressed, I’m +0.5 for the concept of
>> bringing in ozone if it's not going to be a proverbial bull in the china
>> shop.
>> Daryn
>> On Mon, Feb 26, 2018 at 3:18 PM, Jitendra Pandey <jitendra@hortonworks.com
>> wrote:
>>>    Dear folks,
>>>           We would like to start a vote to merge HDFS-7240 branch into
>>> trunk. The context can be reviewed in the DISCUSSION thread, and in the
>>> jiras (See references below).
>>>    HDFS-7240 introduces Hadoop Distributed Storage Layer (HDSL), which
>> is
>>> a distributed, replicated block layer.
>>>    The old HDFS namespace and NN can be connected to this new block
>> layer
>>> as we have described in HDFS-10419.
>>>    We also introduce a key-value namespace called Ozone built on HDSL.
>>>    The code is in a separate module and is turned off by default. In a
>>> secure setup, HDSL and Ozone daemons cannot be started.
>>>    The detailed documentation is available at
>>>             https://cwiki.apache.org/confluence/display/HADOOP/
>>> Hadoop+Distributed+Storage+Layer+and+Applications
>>>    I will start with my vote.
>>>            +1 (binding)
>>>    Discussion Thread:
>>>              https://s.apache.org/7240-merge
>>>              https://s.apache.org/4sfU
>>>    Jiras:
>>>               https://issues.apache.org/jira/browse/HDFS-7240
>>>               https://issues.apache.org/jira/browse/HDFS-10419
>>>               https://issues.apache.org/jira/browse/HDFS-13074
>>>               https://issues.apache.org/jira/browse/HDFS-13180
>>>    Thanks
>>>    jitendra
>>>            On 2/13/18, 6:28 PM, "sanjay Radia" <sanjayosrc@gmail.com>
>>> wrote:
>>>                Sorry the formatting got messed by my email client.  Here
>>> it is again
>>>                Dear
>>>                 Hadoop Community Members,
>>>                   We had multiple community discussions, a few meetings
>>> in smaller groups and also jira discussions with respect to this thread.
>> We
>>> express our gratitude for participation and valuable comments.
>>>                The key questions raised were following
>>>                1) How the new block storage layer and OzoneFS benefit
>>> HDFS and we were asked to chalk out a roadmap towards the goal of a
>>> scalable namenode working with the new storage layer
>>>                2) We were asked to provide a security design
>>>                3)There were questions around stability given ozone
>> brings
>>> in a large body of code.
>>>                4) Why can’t they be separate projects forever or merged
>>> in when production ready?
>>>                We have responded to all the above questions with
>> detailed
>>> explanations and answers on the jira as well as in the discussions. We
>>> believe that should sufficiently address community’s concerns.
>>>                Please see the summary below:
>>>                1) The new code base benefits HDFS scaling and a roadmap
>>> has been provided.
>>>                Summary:
>>>                  - New block storage layer addresses the scalability of
>>> the block layer. We have shown how existing NN can be connected to the
>> new
>>> block layer and its benefits. We have shown 2 milestones, 1st milestone
>> is
>>> much simpler than 2nd milestone while giving almost the same scaling
>>> benefits. Originally we had proposed simply milestone 2 and the community
>>> felt that removing the FSN/BM lock was was a fair amount of work and a
>>> simpler solution would be useful
>>>                  - We provide a new K-V namespace called Ozone FS with
>>> FileSystem/FileContext plugins to allow the users to use the new system.
>>> BTW Hive and Spark work very well on KV-namespaces on the cloud. This
>> will
>>> facilitate stabilizing the new block layer.
>>>                  - The new block layer has a new netty based protocol
>>> engine in the Datanode which, when stabilized, can be used by  the old
>> hdfs
>>> block layer. See details below on sharing of code.
>>>                2) Stability impact on the existing HDFS code base and
>>> code separation. The new block layer and the OzoneFS are in modules that
>>> are separate from old HDFS code - currently there are no calls from HDFS
>>> into Ozone except for DN starting the new block  layer module if
>> configured
>>> to do so. It does not add instability (the instability argument has been
>>> raised many times). Over time as we share code, we will ensure that the
>> old
>>> HDFS continues to remains stable. (for example we plan to stabilize the
>> new
>>> netty based protocol engine in the new block layer before sharing it with
>>> HDFS’s old block layer)
>>>                3) In the short term and medium term, the new system and
>>> HDFS  will be used side-by-side by users. Side by-side usage in the short
>>> term for testing and side-by-side in the medium term for actual
>> production
>>> use till the new system has feature parity with old HDFS. During this
>> time,
>>> sharing the DN daemon and admin functions between the two systems is
>>> operationally important:
>>>                  - Sharing DN daemon to avoid additional operational
>>> daemon lifecycle management
>>>                  - Common decommissioning of the daemon and DN: One
>> place
>>> to decommission for a node and its storage.
>>>                  - Replacing failed disks and internal balancing
>> capacity
>>> across disks - this needs to be done for both the current HDFS blocks and
>>> the new block-layer blocks.
>>>                  - Balancer: we would like use the same balancer and
>>> provide a common way to balance and common management of the bandwidth
>> used
>>> for balancing
>>>                  - Security configuration setup - reuse existing set up
>>> for DNs rather then a new one for an independent cluster.
>>>                4) Need to easily share the block layer code between the
>>> two systems when used side-by-side. Areas where sharing code is desired
>>> over time:
>>>                  - Sharing new block layer’s  new netty based protocol
>>> engine for old HDFS DNs (a long time sore issue for HDFS block layer).
>>>                  - Shallow data copy from old system to new system is
>>> practical only if within same project and daemon otherwise have to deal
>>> with security setting and coordinations across daemons. Shallow copy is
>>> useful as customer migrate from old to new.
>>>                  - Shared disk scheduling in the future and in the short
>>> term have a single round robin rather than independent round robins.
>>>                While sharing code across projects is technically
>> possible
>>> (anything is possible in software),  it is significantly harder typically
>>> requiring  cleaner public apis etc. Sharing within a project though
>>> internal APIs is often simpler (such as the protocol engine that we want
>> to
>>> share).
>>>                5) Security design, including a threat model and and the
>>> solution has been posted.
>>>                6) Temporary Separation and merge later: Several of the
>>> comments in the jira have argued that we temporarily separate the two
>> code
>>> bases for now and then later merge them when the new code is stable:
>>>                  - If there is agreement to merge later, why bother
>>> separating now - there needs to be to be good reasons to separate now.
>> We
>>> have addressed the stability and separation of the new code from existing
>>> above.
>>>                  - Merge the new code back into HDFS later will be
>> harder.
>>>                    **The code and goals will diverge further.
>>>                    ** We will be taking on extra work to split and then
>>> take extra work to merge.
>>>                    ** The issues raised today will be raised all the
>> same
>>> then.
>>>                ------------------------------
>>> ---------------------------------------
>>>                To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.
>>> apache.org
>>>                For additional commands, e-mail:
>>> hdfs-dev-help@hadoop.apache.org
>> --
>> Daryn

To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

View raw message