hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Mackrory <mackror...@gmail.com>
Subject Re: [DISCUSS] A final minor release off branch-2?
Date Tue, 07 Nov 2017 19:29:36 GMT
>> You mentioned rolling-upgrades: It will be good to exactly outline the
type of testing. For e.g., the rolling-upgrades orchestration order has
direct implication on the testing done.

Complete details are available in HDFS-11096 where I'm trying to get
scripts to automate these tests committed so we can run them on Jenkins.
For HDFS, I follow the same order as the documentation. I did not see any
documentation indicate when to upgrade zkfc daemons, so it is done at the
end. I also did not see any documentation about a rolling upgrade for YARN,
so I'm doing ResourceManagers first then NodeManager, basically following
the pattern used in HDFS.

I can't speak much about app compatibility in YARN, etc. but the rolling
upgrade runs Terasuite from Hadoop 2 continually while doing the upgrade
and for sometime afterward. 1 incompatibility was found and fixed in trunk
quite a while ago - that part of the test has been working well for quite a
while now.

>> Copying data between 2.x clusters and 3.x clusters: Does this work
already? Is it broken anywhere that we cannot fix? Do we need bridging
features for this work?

HDFS-11096 also includes tests that data can be copied distcp'd over
webhdfs:// to and from old and new clusters regardless of where the distcp
job is launched from. I'll try a test run that uses hdfs:// this week, too.

As part of that JIRA I also looked through all the protobuf's for any
discrepancies / incompatibilities. One was found and fixed, but the rest
looked good to me.



On Mon, Nov 6, 2017 at 6:42 PM, Vinod Kumar Vavilapalli <vinodkv@apache.org>
wrote:

> The main goal of the bridging release is to ease transition on stuff that
> is guaranteed to be broken.
>
> Of the top of my head, one of the biggest areas is application
> compatibility. When folks move from 2.x to 3.x, are their apps binary
> compatible? Source compatible? Or need changes?
>
> In 1.x -> 2.x upgrade, we did a bunch of work to atleast make old apps be
> source compatible. This means relooking at the API compatibility in 3.x and
> their impact of migrating applications. We will have to revist and
> un-deprecate old APIs, un-delete old APIs and write documentation on how
> apps can be migrated.
>
> Most of this work will be in 3.x line. The bridging release on the other
> hand will have deprecation for APIs that cannot be undeleted. This may be
> already have been done in many places. But we need to make sure and fill
> gaps if any.
>
> Other areas that I can recall from the old days
>  - Config migration: Many configs are deprecated or deleted. We need
> documentation to help folks to move. We also need deprecations in the
> bridging release for configs that cannot be undeleted.
>  - You mentioned rolling-upgrades: It will be good to exactly outline the
> type of testing. For e.g., the rolling-upgrades orchestration order has
> direct implication on the testing done.
>  - Story for downgrades?
>  - Copying data between 2.x clusters and 3.x clusters: Does this work
> already? Is it broken anywhere that we cannot fix? Do we need bridging
> features for this work?
>
> +Vinod
>
> > On Nov 6, 2017, at 12:49 PM, Andrew Wang <andrew.wang@cloudera.com>
> wrote:
> >
> > What are the known gaps that need bridging between 2.x and 3.x?
> >
> > From an HDFS perspective, we've tested wire compat, rolling upgrade, and
> > rollback.
> >
> > From a YARN perspective, we've tested wire compat and rolling upgrade.
> Arun
> > just mentioned an NM rollback issue that I'm not familiar with.
> >
> > Anything else? External to this discussion, these should be documented as
> > known issues for 3.0.
> >
> > Best.
> > Andrew
> >
> > On Sun, Nov 5, 2017 at 1:46 PM, Arun Suresh <asuresh@apache.org> wrote:
> >
> >> Thanks for starting this discussion VInod.
> >>
> >> I agree (C) is a bad idea.
> >> I would prefer (A) given that ATM, branch-2 is still very close to
> >> branch-2.9 - and it is a good time to make a collective decision to lock
> >> down commits to branch-2.
> >>
> >> I think we should also clearly define what the 'bridging' release should
> >> be.
> >> I assume it means the following:
> >> * Any 2.x user wanting to move to 3.x must first upgrade to the bridging
> >> release first and then upgrade to the 3.x release.
> >> * With regard to state store upgrades (at least NM state stores) the
> >> bridging state stores should be aware of all new 3.x keys so the
> implicit
> >> assumption would be that a user can only rollback from the 3.x release
> to
> >> the bridging release and not to the old 2.x release.
> >> * Use the opportunity to clean up deprecated API ?
> >> * Do we even want to consider a separate bridging release for 2.7, 2.8
> an
> >> 2.9 lines ?
> >>
> >> Cheers
> >> -Arun
> >>
> >> On Fri, Nov 3, 2017 at 5:07 PM, Vinod Kumar Vavilapalli <
> >> vinodkv@apache.org>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> With 3.0.0 GA around the corner (tx for the push, Andrew!), 2.9.0 RC
> out
> >>> (tx Arun / Subru!) and 2.8.2 (tx Junping!), I think it's high time we
> >> have
> >>> a discussion on how we manage our developmental bandwidth between 2.x
> >> line
> >>> and 3.x lines.
> >>>
> >>> Once 3.0 GA goes out, we will have two parallel and major release
> lines.
> >>> The last time we were in this situation was back when we did 1.x -> 2.x
> >>> jump.
> >>>
> >>> The parallel releases implies overhead of decisions, branch-merges and
> >>> back-ports. Right now we already do backports for 2.7.5, 2.8.2, 2.9.1,
> >>> 3.0.1 and potentially a 3.1.0 in a few months after 3.0.0 GA. And many
> of
> >>> these lines - for e.g 2.8, 2.9 - are going to be used for a while at a
> >>> bunch of large sites! At the same time, our users won't migrate to 3.0
> GA
> >>> overnight - so we do have to support two parallel lines.
> >>>
> >>> I propose we start thinking of the fate of branch-2. The idea is to
> have
> >>> one final release that helps our users migrate from 2.x to 3.x. This
> >>> includes any changes on the older line to bridge compatibility issues,
> >>> upgrade issues, layout changes, tooling etc.
> >>>
> >>> We have a few options I think
> >>> (A)
> >>>    -- Make 2.9.x the last minor release off branch-2
> >>>    -- Have a maintenance release that bridges 2.9 to 3.x
> >>>    -- Continue to make more maintenance releases on 2.8 and 2.9 as
> >>> necessary
> >>>    -- All new features obviously only go into the 3.x line as no
> >> features
> >>> can go into the maint line.
> >>>
> >>> (B)
> >>>    -- Create a new 2.10 release which doesn't have any new features,
> but
> >>> as a bridging release
> >>>    -- Continue to make more maintenance releases on 2.8, 2.9 and 2.10
> as
> >>> necessary
> >>>    -- All new features, other than the bridging changes, go into the
> 3.x
> >>> line
> >>>
> >>> (C)
> >>>    -- Continue making branch-2 releases and postpone this discussion
> for
> >>> later
> >>>
> >>> I'm leaning towards (A) or to a lesser extent (B). Willing to hear
> >>> otherwise.
> >>>
> >>> Now, this obviously doesn't mean blocking of any more minor releases on
> >>> branch-2. Obviously, any interested committer / PMC can roll up his/her
> >>> sleeves, create a release plan and release, but we all need to
> >> acknowledge
> >>> that versions are not cheap and figure out how the community bandwidth
> is
> >>> split overall.
> >>>
> >>> Thanks
> >>> +Vinod
> >>> PS: The proposal is obviously not to force everyone to go in one
> >> direction
> >>> but more of a nudging the community to figure out if we can focus a
> major
> >>> part of of our bandwidth on one line. I had a similar concern when we
> >> were
> >>> doing 2.8 and 3.0 in parallel, but the impending possibility of
> spreading
> >>> too thin is much worse IMO.
> >>> PPS: (C) is a bad choice. With 2.8 and 2.9 we are already seeing user
> >>> adoption splintering between two lines. With 2.10, 2.11 etc coexisting
> >> with
> >>> 3.0, 3.1 etc, we will revisit the mad phase years ago when we had
> 0.20.x,
> >>> 0.20-security coexisting with 0.21, 0.22 etc.
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message