hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jitendra Pandey <jiten...@hortonworks.com>
Subject Re: [VOTE] Merging branch HDFS-7240 to trunk
Date Tue, 06 Mar 2018 22:32:06 GMT
Hi Andrew, 
 I think we can eliminate the maintenance costs even in the same repo. We can make following
changes that incorporate suggestions from Daryn and Owen as well.
1. Hadoop-hdsl-project will be at the root of hadoop repo, in a separate directory.
2. There will be no dependencies from common, yarn and hdfs to hdsl/ozone.
3. Based on Daryn’s suggestion, the Hdsl can be optionally (via config) be loaded in DN
as a pluggable module. 
     If not loaded, there will be absolutely no code path through hdsl or ozone.
4. To further make it easier for folks building hadoop, we can support a maven profile for
hdsl/ozone. If the profile is deactivated hdsl/ozone will not be built.
     For example, Cloudera can choose to skip even compiling/building hdsl/ozone and therefore
no maintenance overhead whatsoever.
     HADOOP-14453 has a patch that shows how it can be done.

Arguably, there are two kinds of maintenance costs. Costs for developers and the cost for
- Developers: A maven profile as noted in point (3) and (4) above completely addresses the
concern for developers 
                                 as there are no compile time dependencies and further, they
can choose not to build ozone/hdsl.
- User: Cost to users will be completely alleviated if ozone/hdsl is not loaded as mentioned
in point (3) above.


From: Andrew Wang <andrew.wang@cloudera.com>
Date: Monday, March 5, 2018 at 3:54 PM
To: Wangda Tan <wheeleast@gmail.com>
Cc: Owen O'Malley <owen.omalley@gmail.com>, Daryn Sharp <daryn@oath.com.invalid>,
Jitendra Pandey <jitendra@hortonworks.com>, hdfs-dev <hdfs-dev@hadoop.apache.org>,
"common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>, "yarn-dev@hadoop.apache.org"
<yarn-dev@hadoop.apache.org>, "mapreduce-dev@hadoop.apache.org" <mapreduce-dev@hadoop.apache.org>
Subject: Re: [VOTE] Merging branch HDFS-7240 to trunk

Hi Owen, Wangda, 

Thanks for clearly laying out the subproject options, that helps the discussion.

I'm all onboard with the idea of regular releases, and it's something I tried to do with the
3.0 alphas and betas. The problem though isn't a lack of commitment from feature developers
like Sanjay or Jitendra; far from it! I think every feature developer makes a reasonable effort
to test their code before it's merged. Yet, my experience as an RM is that more code comes
with more risk. I don't believe that Ozone is special or different in this regard. It comes
with a maintenance cost, not a maintenance benefit.

I'm advocating for #3: separate source, separate release. Since HDSL stability and FSN/BM
refactoring are still a ways out, I don't want to incur a maintenance cost now. I sympathize
with the sentiment that working cross-repo is harder than within same repo, but the right
tooling can make this a lot easier (e.g. git submodule, Google's repo tool). We have experience
doing this internally here at Cloudera, and I'm happy to share knowledge and possibly code.


On Fri, Mar 2, 2018 at 4:41 PM, Wangda Tan <wheeleast@gmail.com> wrote:
I like the idea of same source / same release and put Ozone's source under a different directory.

Like Owen mentioned, It gonna be important for all parties to keep a regular and shorter release
cycle for Hadoop, e.g. 3-4 months between minor releases. Users can try features and give
feedbacks to stabilize feature earlier; developers can be happier since efforts will be consumed
by users soon after features get merged. In addition to this, if features merged to trunk
after reasonable tests/review, Andrew's concern may not be a problem anymore: 

bq. Finally, I earnestly believe that Ozone/HDSL itself would benefit from
being a separate project. Ozone could release faster and iterate more
quickly if it wasn't hampered by Hadoop's release schedule and security and
compatibility requirements.


On Fri, Mar 2, 2018 at 4:24 PM, Owen O'Malley <owen.omalley@gmail.com> wrote:
On Thu, Mar 1, 2018 at 11:03 PM, Andrew Wang <andrew.wang@cloudera.com>

Owen mentioned making a Hadoop subproject; we'd have to
> hash out what exactly this means (I assume a separate repo still managed by
> the Hadoop project), but I think we could make this work if it's more
> attractive than incubation or a new TLP.

Ok, there are multiple levels of sub-projects that all make sense:

   - Same source tree, same releases - examples like HDFS & YARN
   - Same master branch, separate releases and release branches - Hive's
   Storage API vs Hive. It is in the source tree for the master branch, but
   has distinct releases and release branches.
   - Separate source, separate release - Apache Commons.

There are advantages and disadvantages to each. I'd propose that we use the
same source, same release pattern for Ozone. Note that we tried and later
reverted doing Common, HDFS, and YARN as separate source, separate release
because it was too much trouble. I like Daryn's idea of putting it as a top
level directory in Hadoop and making sure that nothing in Common, HDFS, or
YARN depend on it. That way if a Release Manager doesn't think it is ready
for release, it can be trivially removed before the release.

One thing about using the same releases, Sanjay and Jitendra are signing up
to make much more regular bugfix and minor releases in the near future. For
example, they'll need to make 3.2 relatively soon to get it released and
then 3.3 somewhere in the next 3 to 6 months. That would be good for the
project. Hadoop needs more regular releases and fewer big bang releases.

.. Owen

View raw message