hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wang <andrew.w...@cloudera.com>
Subject Re: 2.7.3 release plan
Date Tue, 05 Apr 2016 21:30:28 GMT
I'm +1 for ctrezzo's proposal, happy to do the revert from branch-2.7 if
this is acceptable to Vinod.

There's some additional discussion on the HDFS-8791 JIRA for those who are
only following this email thread.

Best,
Andrew

On Tue, Apr 5, 2016 at 2:03 PM, Chris Trezzo <ctrezzo@gmail.com> wrote:

> In light of the additional conversation on HDFS-8791, I would like to
> re-propose the following:
>
> 1. Revert the new datanode layout (HDFS-8791) from the 2.7 branch. The
> layout change currently does not support downgrades which breaks our
> upgrade/downgrade policies for dot releases.
>
> 2. Cut a 2.8 release off of the 2.7.3 release with the addition of
> HDFS-8791. This would give customers a stable release that they could
> deploy with the new layout. As discussed on the jira, this is still in line
> with user expectation for minor releases as we have done layout changes in
> a number of 2.x minor releases already. The current 2.8 would become 2.9
> and continue its current release schedule.
>
> What does everyone think? If unsupported downgrades between minor releases
> is still not agreeable, then as stated by Vinod, we would need to either
> add support for downgrades with dn layout changes or revert the layout
> change from branch-2. If we are OK with the layout change in a minor
> release, but think that the issue does not affect enough customers to
> warrant a separate release, we could simply leave it in branch-2 and let it
> be released with the current 2.8.
>
>
> On Mon, Apr 4, 2016 at 1:48 PM, Vinod Kumar Vavilapalli <
> vinodkv@apache.org>
> wrote:
>
> > I commented on the JIRA way back (see
> >
> https://issues.apache.org/jira/browse/HDFS-8791?focusedCommentId=15036666&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036666
> ),
> > saying what I said below. Unfortunately, I haven’t followed the patch
> along
> > after my initial comment.
> >
> > This isn’t about any specific release - starting 2.6 we declared support
> > for rolling upgrades and downgrades. Any patch that breaks this should
> not
> > be in branch-2.
> >
> > Two options from where I stand
> >  (1) For folks who worked on the patch: Is there a way to make (a) the
> > upgrade-downgrade seamless for people who don’t care about this (b) and
> > have explicit documentation for people who care to switch this behavior
> on
> > and are willing to risk not having downgrades. If this means a new
> > configuration property, so be it. It’s a necessary evil.
> >  (2) Just let specific users backport this into specific 2.x branches
> they
> > need and leave it only on trunk.
> >
> > Unless this behavior stops breaking rolling upgrades/downgrades, I think
> > we should just revert it from branch-2 and definitely 2.7.3 as it stands
> > today.
> >
> > +Vinod
> >
> >
> > > On Apr 1, 2016, at 2:54 PM, Chris Trezzo <ctrezzo@gmail.com> wrote:
> > >
> > > A few thoughts:
> > >
> > > 1. To echo Andrew Wang, HDFS-8578 (parallel upgrades) should be a
> > > prerequisite for HDFS-8791. Without that patch, upgrades can be very
> slow
> > > for data nodes depending on your setup.
> > >
> > > 2. We have already deployed this patch internally so, with my Twitter
> hat
> > > on, I would be perfectly happy as long as it makes it into trunk and
> 2.8.
> > > That being said, I would be hesitant to deploy the current 2.7.x or
> 2.6.x
> > > releases on a large production cluster that has a diverse set of block
> > ids
> > > without this patch, especially if your data nodes have a large number
> of
> > > disks or you are using federation. To be clear though: this highly
> > depends
> > > on your setup and at a minimum you should verify that this regression
> > will
> > > not affect you. The current block-id based layout in 2.6.x and 2.7.2
> has
> > a
> > > performance regression that gets worse over time. When you see it
> > happening
> > > on a live cluster, it is one of the harder issues to identify a root
> > cause
> > > and debug. I do understand that this is currently only affecting a
> > smaller
> > > number of users, but I also think this number has potential to increase
> > as
> > > time goes on. Maybe we can issue a warning in the release notes for
> > future
> > > 2.7.x and 2.6.x releases?
> > >
> > > 3. One option (this was suggested on HDFS-8791 and I think Sean alluded
> > to
> > > this proposal on this thread) would be to cut a 2.8 release off of the
> > > 2.7.3 release with the new layout. What people currently think of as
> 2.8
> > > would then become 2.9. This would give customers a stable release that
> > they
> > > could deploy with the new layout and would not break upgrade and
> > downgrade
> > > expectations.
> > >
> > > On Fri, Apr 1, 2016 at 11:32 AM, Andrew Purtell <apurtell@apache.org>
> > wrote:
> > >
> > >> As a downstream consumer of Apache Hadoop 2.7.x releases, I expect we
> > would
> > >> patch the release to revert HDFS-8791 before pushing it out to
> > production.
> > >> For what it's worth.
> > >>
> > >>
> > >> On Fri, Apr 1, 2016 at 11:23 AM, Andrew Wang <
> andrew.wang@cloudera.com>
> > >> wrote:
> > >>
> > >>> One other thing I wanted to bring up regarding HDFS-8791, we haven't
> > >>> backported the parallel DN upgrade improvement (HDFS-8578) to
> > branch-2.6.
> > >>> HDFS-8578 is a very important related fix since otherwise upgrade
> will
> > be
> > >>> very slow.
> > >>>
> > >>> On Thu, Mar 31, 2016 at 10:35 AM, Andrew Wang <
> > andrew.wang@cloudera.com>
> > >>> wrote:
> > >>>
> > >>>> As I expressed on HDFS-8791, I do not want to include this JIRA
in a
> > >>>> maintenance release. I've only seen it crop up on a handful of
our
> > >>>> customer's clusters, and large users like Twitter and Yahoo that
> seem
> > >> to
> > >>> be
> > >>>> more affected are also the most able to patch this change in
> > >> themselves.
> > >>>>
> > >>>> Layout upgrades are quite disruptive, and I don't think it's worth
> > >>>> breaking upgrade and downgrade expectations when it doesn't affect
> the
> > >>> (in
> > >>>> my experience) vast majority of users.
> > >>>>
> > >>>> Vinod seemed to have a similar opinion in his comment on HDFS-8791,
> > but
> > >>>> will let him elaborate.
> > >>>>
> > >>>> Best,
> > >>>> Andrew
> > >>>>
> > >>>> On Thu, Mar 31, 2016 at 9:11 AM, Sean Busbey <busbey@cloudera.com>
> > >>> wrote:
> > >>>>
> > >>>>> As of 2 days ago, there were already 135 jiras associated with
> 2.7.3,
> > >>>>> if *any* of them end up introducing a regression the inclusion
of
> > >>>>> HDFS-8791 means that folks will have cluster downtime in order
to
> > back
> > >>>>> things out. If that happens to any substantial number of downstream
> > >>>>> folks, or any particularly vocal downstream folks, then it
is very
> > >>>>> likely we'll lose the remaining trust of operators for rolling
out
> > >>>>> maintenance releases. That's a pretty steep cost.
> > >>>>>
> > >>>>> Please do not include HDFS-8791 in any 2.6.z release. Folks
having
> to
> > >>>>> be aware that an upgrade from e.g. 2.6.5 to 2.7.2 will fail
is an
> > >>>>> unreasonable burden.
> > >>>>>
> > >>>>> I agree that this fix is important, I just think we should
either
> cut
> > >>>>> a version of 2.8 that includes it or find a way to do it that
gives
> > an
> > >>>>> operational path for rolling downgrade.
> > >>>>>
> > >>>>> On Thu, Mar 31, 2016 at 10:10 AM, Junping Du <jdu@hortonworks.com>
> > >>> wrote:
> > >>>>>> Thanks for bringing up this topic, Sean.
> > >>>>>> When I released our latest Hadoop release 2.6.4, the patch
of
> > >>> HDFS-8791
> > >>>>> haven't been committed in so that's why we didn't discuss this
> > >> earlier.
> > >>>>>> I remember in JIRA discussion, we treated this layout change
as a
> > >>>>> Blocker bug that fixing a significant performance regression
before
> > >> but
> > >>> not
> > >>>>> a normal performance improvement. And I believe HDFS community
> > already
> > >>> did
> > >>>>> their best with careful and patient to deliver the fix and
other
> > >> related
> > >>>>> patches (like upgrade fix in HDFS-8578). Take an example of
> > HDFS-8578,
> > >>> you
> > >>>>> can see 30+ rounds patch review back and forth by senior
> committers,
> > >>> not to
> > >>>>> mention the outstanding performance test data in HDFS-8791.
> > >>>>>> I would trust our HDFS committers' judgement to land HDFS-8791
on
> > >>>>> 2.7.3. However, that needs Vinod's final confirmation who serves
as
> > RM
> > >>> for
> > >>>>> branch-2.7. In addition, I didn't see any blocker issue to
bring it
> > >> into
> > >>>>> 2.6.5 now.
> > >>>>>> Just my 2 cents.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>>
> > >>>>>> Junping
> > >>>>>>
> > >>>>>> ________________________________________
> > >>>>>> From: Sean Busbey <busbey@cloudera.com>
> > >>>>>> Sent: Thursday, March 31, 2016 2:57 PM
> > >>>>>> To: hdfs-dev@hadoop.apache.org
> > >>>>>> Cc: Hadoop Common; yarn-dev@hadoop.apache.org;
> > >>>>> mapreduce-dev@hadoop.apache.org
> > >>>>>> Subject: Re: 2.7.3 release plan
> > >>>>>>
> > >>>>>> A layout change in a maintenance release sounds very risky.
I saw
> > >> some
> > >>>>>> discussion on the JIRA about those risks, but the consensus
seemed
> > >> to
> > >>>>>> be "we'll leave it up to the 2.6 and 2.7 release managers."
I
> > >> thought
> > >>>>>> we did RMs per release rather than per branch? No one claiming
to
> > >> be a
> > >>>>>> release manager ever spoke up AFAICT.
> > >>>>>>
> > >>>>>> Should this change be included? Should it go into a special
2.8
> > >>>>>> release as mentioned in the ticket?
> > >>>>>>
> > >>>>>> On Thu, Mar 31, 2016 at 1:45 AM, Akira AJISAKA
> > >>>>>> <ajisakaa@oss.nttdata.co.jp> wrote:
> > >>>>>>> Thank you Vinod!
> > >>>>>>>
> > >>>>>>> FYI: 2.7.3 will be a bit special release.
> > >>>>>>>
> > >>>>>>> HDFS-8791 bumped up the datanode layout version,
> > >>>>>>> so rolling downgrade from 2.7.3 to 2.7.[0-2]
> > >>>>>>> is impossible. We can rollback instead.
> > >>>>>>>
> > >>>>>>> https://issues.apache.org/jira/browse/HDFS-8791
> > >>>>>>>
> > >>>>>
> > >>>
> > >>
> >
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>> Akira
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 3/31/16 08:18, Vinod Kumar Vavilapalli wrote:
> > >>>>>>>>
> > >>>>>>>> Hi all,
> > >>>>>>>>
> > >>>>>>>> Got nudged about 2.7.3. Was previously waiting
for 2.6.4 to go
> out
> > >>>>> (which
> > >>>>>>>> did go out mid February). Got a little busy since.
> > >>>>>>>>
> > >>>>>>>> Following up the 2.7.2 maintenance release, we
should work
> > >> towards a
> > >>>>>>>> 2.7.3. The focus obviously is to have blocker issues
[1],
> > >> bug-fixes
> > >>>>> and *no*
> > >>>>>>>> features / improvements.
> > >>>>>>>>
> > >>>>>>>> I hope to cut an RC in a week - giving enough time
for
> outstanding
> > >>>>> blocker
> > >>>>>>>> / critical issues. Will start moving out any tickets
that are
> not
> > >>>>> blockers
> > >>>>>>>> and/or won’t fit the timeline - there are 3 blockers
and 15
> > >> critical
> > >>>>> tickets
> > >>>>>>>> outstanding as of now.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> +Vinod
> > >>>>>>>>
> > >>>>>>>> [1] 2.7.3 release blockers:
> > >>>>>>>> https://issues.apache.org/jira/issues/?filter=12335343
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> busbey
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> busbey
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Best regards,
> > >>
> > >>   - Andy
> > >>
> > >> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > >> (via Tom White)
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message