hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Trezzo <ctre...@gmail.com>
Subject Re: 2.7.3 release plan
Date Tue, 05 Apr 2016 21:03:13 GMT
In light of the additional conversation on HDFS-8791, I would like to
re-propose the following:

1. Revert the new datanode layout (HDFS-8791) from the 2.7 branch. The
layout change currently does not support downgrades which breaks our
upgrade/downgrade policies for dot releases.

2. Cut a 2.8 release off of the 2.7.3 release with the addition of
HDFS-8791. This would give customers a stable release that they could
deploy with the new layout. As discussed on the jira, this is still in line
with user expectation for minor releases as we have done layout changes in
a number of 2.x minor releases already. The current 2.8 would become 2.9
and continue its current release schedule.

What does everyone think? If unsupported downgrades between minor releases
is still not agreeable, then as stated by Vinod, we would need to either
add support for downgrades with dn layout changes or revert the layout
change from branch-2. If we are OK with the layout change in a minor
release, but think that the issue does not affect enough customers to
warrant a separate release, we could simply leave it in branch-2 and let it
be released with the current 2.8.


On Mon, Apr 4, 2016 at 1:48 PM, Vinod Kumar Vavilapalli <vinodkv@apache.org>
wrote:

> I commented on the JIRA way back (see
> https://issues.apache.org/jira/browse/HDFS-8791?focusedCommentId=15036666&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036666),
> saying what I said below. Unfortunately, I haven’t followed the patch along
> after my initial comment.
>
> This isn’t about any specific release - starting 2.6 we declared support
> for rolling upgrades and downgrades. Any patch that breaks this should not
> be in branch-2.
>
> Two options from where I stand
>  (1) For folks who worked on the patch: Is there a way to make (a) the
> upgrade-downgrade seamless for people who don’t care about this (b) and
> have explicit documentation for people who care to switch this behavior on
> and are willing to risk not having downgrades. If this means a new
> configuration property, so be it. It’s a necessary evil.
>  (2) Just let specific users backport this into specific 2.x branches they
> need and leave it only on trunk.
>
> Unless this behavior stops breaking rolling upgrades/downgrades, I think
> we should just revert it from branch-2 and definitely 2.7.3 as it stands
> today.
>
> +Vinod
>
>
> > On Apr 1, 2016, at 2:54 PM, Chris Trezzo <ctrezzo@gmail.com> wrote:
> >
> > A few thoughts:
> >
> > 1. To echo Andrew Wang, HDFS-8578 (parallel upgrades) should be a
> > prerequisite for HDFS-8791. Without that patch, upgrades can be very slow
> > for data nodes depending on your setup.
> >
> > 2. We have already deployed this patch internally so, with my Twitter hat
> > on, I would be perfectly happy as long as it makes it into trunk and 2.8.
> > That being said, I would be hesitant to deploy the current 2.7.x or 2.6.x
> > releases on a large production cluster that has a diverse set of block
> ids
> > without this patch, especially if your data nodes have a large number of
> > disks or you are using federation. To be clear though: this highly
> depends
> > on your setup and at a minimum you should verify that this regression
> will
> > not affect you. The current block-id based layout in 2.6.x and 2.7.2 has
> a
> > performance regression that gets worse over time. When you see it
> happening
> > on a live cluster, it is one of the harder issues to identify a root
> cause
> > and debug. I do understand that this is currently only affecting a
> smaller
> > number of users, but I also think this number has potential to increase
> as
> > time goes on. Maybe we can issue a warning in the release notes for
> future
> > 2.7.x and 2.6.x releases?
> >
> > 3. One option (this was suggested on HDFS-8791 and I think Sean alluded
> to
> > this proposal on this thread) would be to cut a 2.8 release off of the
> > 2.7.3 release with the new layout. What people currently think of as 2.8
> > would then become 2.9. This would give customers a stable release that
> they
> > could deploy with the new layout and would not break upgrade and
> downgrade
> > expectations.
> >
> > On Fri, Apr 1, 2016 at 11:32 AM, Andrew Purtell <apurtell@apache.org>
> wrote:
> >
> >> As a downstream consumer of Apache Hadoop 2.7.x releases, I expect we
> would
> >> patch the release to revert HDFS-8791 before pushing it out to
> production.
> >> For what it's worth.
> >>
> >>
> >> On Fri, Apr 1, 2016 at 11:23 AM, Andrew Wang <andrew.wang@cloudera.com>
> >> wrote:
> >>
> >>> One other thing I wanted to bring up regarding HDFS-8791, we haven't
> >>> backported the parallel DN upgrade improvement (HDFS-8578) to
> branch-2.6.
> >>> HDFS-8578 is a very important related fix since otherwise upgrade will
> be
> >>> very slow.
> >>>
> >>> On Thu, Mar 31, 2016 at 10:35 AM, Andrew Wang <
> andrew.wang@cloudera.com>
> >>> wrote:
> >>>
> >>>> As I expressed on HDFS-8791, I do not want to include this JIRA in a
> >>>> maintenance release. I've only seen it crop up on a handful of our
> >>>> customer's clusters, and large users like Twitter and Yahoo that seem
> >> to
> >>> be
> >>>> more affected are also the most able to patch this change in
> >> themselves.
> >>>>
> >>>> Layout upgrades are quite disruptive, and I don't think it's worth
> >>>> breaking upgrade and downgrade expectations when it doesn't affect the
> >>> (in
> >>>> my experience) vast majority of users.
> >>>>
> >>>> Vinod seemed to have a similar opinion in his comment on HDFS-8791,
> but
> >>>> will let him elaborate.
> >>>>
> >>>> Best,
> >>>> Andrew
> >>>>
> >>>> On Thu, Mar 31, 2016 at 9:11 AM, Sean Busbey <busbey@cloudera.com>
> >>> wrote:
> >>>>
> >>>>> As of 2 days ago, there were already 135 jiras associated with 2.7.3,
> >>>>> if *any* of them end up introducing a regression the inclusion of
> >>>>> HDFS-8791 means that folks will have cluster downtime in order to
> back
> >>>>> things out. If that happens to any substantial number of downstream
> >>>>> folks, or any particularly vocal downstream folks, then it is very
> >>>>> likely we'll lose the remaining trust of operators for rolling out
> >>>>> maintenance releases. That's a pretty steep cost.
> >>>>>
> >>>>> Please do not include HDFS-8791 in any 2.6.z release. Folks having
to
> >>>>> be aware that an upgrade from e.g. 2.6.5 to 2.7.2 will fail is an
> >>>>> unreasonable burden.
> >>>>>
> >>>>> I agree that this fix is important, I just think we should either
cut
> >>>>> a version of 2.8 that includes it or find a way to do it that gives
> an
> >>>>> operational path for rolling downgrade.
> >>>>>
> >>>>> On Thu, Mar 31, 2016 at 10:10 AM, Junping Du <jdu@hortonworks.com>
> >>> wrote:
> >>>>>> Thanks for bringing up this topic, Sean.
> >>>>>> When I released our latest Hadoop release 2.6.4, the patch of
> >>> HDFS-8791
> >>>>> haven't been committed in so that's why we didn't discuss this
> >> earlier.
> >>>>>> I remember in JIRA discussion, we treated this layout change
as a
> >>>>> Blocker bug that fixing a significant performance regression before
> >> but
> >>> not
> >>>>> a normal performance improvement. And I believe HDFS community
> already
> >>> did
> >>>>> their best with careful and patient to deliver the fix and other
> >> related
> >>>>> patches (like upgrade fix in HDFS-8578). Take an example of
> HDFS-8578,
> >>> you
> >>>>> can see 30+ rounds patch review back and forth by senior committers,
> >>> not to
> >>>>> mention the outstanding performance test data in HDFS-8791.
> >>>>>> I would trust our HDFS committers' judgement to land HDFS-8791
on
> >>>>> 2.7.3. However, that needs Vinod's final confirmation who serves
as
> RM
> >>> for
> >>>>> branch-2.7. In addition, I didn't see any blocker issue to bring
it
> >> into
> >>>>> 2.6.5 now.
> >>>>>> Just my 2 cents.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Junping
> >>>>>>
> >>>>>> ________________________________________
> >>>>>> From: Sean Busbey <busbey@cloudera.com>
> >>>>>> Sent: Thursday, March 31, 2016 2:57 PM
> >>>>>> To: hdfs-dev@hadoop.apache.org
> >>>>>> Cc: Hadoop Common; yarn-dev@hadoop.apache.org;
> >>>>> mapreduce-dev@hadoop.apache.org
> >>>>>> Subject: Re: 2.7.3 release plan
> >>>>>>
> >>>>>> A layout change in a maintenance release sounds very risky.
I saw
> >> some
> >>>>>> discussion on the JIRA about those risks, but the consensus
seemed
> >> to
> >>>>>> be "we'll leave it up to the 2.6 and 2.7 release managers."
I
> >> thought
> >>>>>> we did RMs per release rather than per branch? No one claiming
to
> >> be a
> >>>>>> release manager ever spoke up AFAICT.
> >>>>>>
> >>>>>> Should this change be included? Should it go into a special
2.8
> >>>>>> release as mentioned in the ticket?
> >>>>>>
> >>>>>> On Thu, Mar 31, 2016 at 1:45 AM, Akira AJISAKA
> >>>>>> <ajisakaa@oss.nttdata.co.jp> wrote:
> >>>>>>> Thank you Vinod!
> >>>>>>>
> >>>>>>> FYI: 2.7.3 will be a bit special release.
> >>>>>>>
> >>>>>>> HDFS-8791 bumped up the datanode layout version,
> >>>>>>> so rolling downgrade from 2.7.3 to 2.7.[0-2]
> >>>>>>> is impossible. We can rollback instead.
> >>>>>>>
> >>>>>>> https://issues.apache.org/jira/browse/HDFS-8791
> >>>>>>>
> >>>>>
> >>>
> >>
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Akira
> >>>>>>>
> >>>>>>>
> >>>>>>> On 3/31/16 08:18, Vinod Kumar Vavilapalli wrote:
> >>>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> Got nudged about 2.7.3. Was previously waiting for 2.6.4
to go out
> >>>>> (which
> >>>>>>>> did go out mid February). Got a little busy since.
> >>>>>>>>
> >>>>>>>> Following up the 2.7.2 maintenance release, we should
work
> >> towards a
> >>>>>>>> 2.7.3. The focus obviously is to have blocker issues
[1],
> >> bug-fixes
> >>>>> and *no*
> >>>>>>>> features / improvements.
> >>>>>>>>
> >>>>>>>> I hope to cut an RC in a week - giving enough time for
outstanding
> >>>>> blocker
> >>>>>>>> / critical issues. Will start moving out any tickets
that are not
> >>>>> blockers
> >>>>>>>> and/or won’t fit the timeline - there are 3 blockers
and 15
> >> critical
> >>>>> tickets
> >>>>>>>> outstanding as of now.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> +Vinod
> >>>>>>>>
> >>>>>>>> [1] 2.7.3 release blockers:
> >>>>>>>> https://issues.apache.org/jira/issues/?filter=12335343
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> busbey
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> busbey
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Best regards,
> >>
> >>   - Andy
> >>
> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> >> (via Tom White)
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message