Return-Path: X-Original-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 04E9019EC0 for ; Wed, 6 Apr 2016 21:54:40 +0000 (UTC) Received: (qmail 5819 invoked by uid 500); 6 Apr 2016 21:54:36 -0000 Delivered-To: apmail-hadoop-yarn-dev-archive@hadoop.apache.org Received: (qmail 5470 invoked by uid 500); 6 Apr 2016 21:54:36 -0000 Mailing-List: contact yarn-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-dev@hadoop.apache.org Delivered-To: mailing list yarn-dev@hadoop.apache.org Received: (qmail 5424 invoked by uid 99); 6 Apr 2016 21:54:36 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Apr 2016 21:54:36 +0000 Received: from [10.22.10.167] (unknown [192.175.27.10]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id BBAFC1A0182; Wed, 6 Apr 2016 21:54:35 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: 2.7.3 release plan From: Vinod Kumar Vavilapalli In-Reply-To: Date: Wed, 6 Apr 2016 14:54:34 -0700 Cc: "yarn-dev@hadoop.apache.org" , "hdfs-dev@hadoop.apache.org" , "common-dev@hadoop.apache.org" Content-Transfer-Encoding: quoted-printable Message-Id: <35A1DE71-E127-41C7-B4DD-6FDC03F0EE68@apache.org> References: <306CE9EF-09A9-41E3-A0A8-190C7D74E275@apache.org> <56FCC773.5060300@oss.nttdata.co.jp> <1459437045133.42160@hortonworks.com> <4B30C146-A465-4D4B-9C0A-773AE8A4CD1B@apache.org> To: mapreduce-dev@hadoop.apache.org X-Mailer: Apple Mail (2.2104) Thanks Chris. +1 for reverting form 2.7. This is the least we should do. Can you help = doing the needful? I personally am not completely sold on a release with *only* the layout = changes. Like I was saying before, we can let specific users backport = this into specific 2.x branches they need and leave it only on trunk / = branch-2. That said, I would love to hear others=E2=80=99 thoughts on = this, but let=E2=80=99s fork that discussion off from this 2.7.3 thread. = Re a fresh 2.8, I have renewed my efforts on 2.8 with a view of cutting = an RC in weeks. Not sure if that does or doesn=E2=80=99t help this = discussion. Thanks +Vinod > On Apr 5, 2016, at 2:03 PM, Chris Trezzo wrote: >=20 > In light of the additional conversation on HDFS-8791, I would like to > re-propose the following: >=20 > 1. Revert the new datanode layout (HDFS-8791) from the 2.7 branch. The > layout change currently does not support downgrades which breaks our > upgrade/downgrade policies for dot releases. >=20 > 2. Cut a 2.8 release off of the 2.7.3 release with the addition of > HDFS-8791. This would give customers a stable release that they could > deploy with the new layout. As discussed on the jira, this is still in = line > with user expectation for minor releases as we have done layout = changes in > a number of 2.x minor releases already. The current 2.8 would become = 2.9 > and continue its current release schedule. >=20 > What does everyone think? If unsupported downgrades between minor = releases > is still not agreeable, then as stated by Vinod, we would need to = either > add support for downgrades with dn layout changes or revert the layout > change from branch-2. If we are OK with the layout change in a minor > release, but think that the issue does not affect enough customers to > warrant a separate release, we could simply leave it in branch-2 and = let it > be released with the current 2.8. >=20 >=20 > On Mon, Apr 4, 2016 at 1:48 PM, Vinod Kumar Vavilapalli = > wrote: >=20 >> I commented on the JIRA way back (see >> = https://issues.apache.org/jira/browse/HDFS-8791?focusedCommentId=3D1503666= 6&page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#= comment-15036666), >> saying what I said below. Unfortunately, I haven=E2=80=99t followed = the patch along >> after my initial comment. >>=20 >> This isn=E2=80=99t about any specific release - starting 2.6 we = declared support >> for rolling upgrades and downgrades. Any patch that breaks this = should not >> be in branch-2. >>=20 >> Two options from where I stand >> (1) For folks who worked on the patch: Is there a way to make (a) the >> upgrade-downgrade seamless for people who don=E2=80=99t care about = this (b) and >> have explicit documentation for people who care to switch this = behavior on >> and are willing to risk not having downgrades. If this means a new >> configuration property, so be it. It=E2=80=99s a necessary evil. >> (2) Just let specific users backport this into specific 2.x branches = they >> need and leave it only on trunk. >>=20 >> Unless this behavior stops breaking rolling upgrades/downgrades, I = think >> we should just revert it from branch-2 and definitely 2.7.3 as it = stands >> today. >>=20 >> +Vinod >>=20 >>=20 >>> On Apr 1, 2016, at 2:54 PM, Chris Trezzo wrote: >>>=20 >>> A few thoughts: >>>=20 >>> 1. To echo Andrew Wang, HDFS-8578 (parallel upgrades) should be a >>> prerequisite for HDFS-8791. Without that patch, upgrades can be very = slow >>> for data nodes depending on your setup. >>>=20 >>> 2. We have already deployed this patch internally so, with my = Twitter hat >>> on, I would be perfectly happy as long as it makes it into trunk and = 2.8. >>> That being said, I would be hesitant to deploy the current 2.7.x or = 2.6.x >>> releases on a large production cluster that has a diverse set of = block >> ids >>> without this patch, especially if your data nodes have a large = number of >>> disks or you are using federation. To be clear though: this highly >> depends >>> on your setup and at a minimum you should verify that this = regression >> will >>> not affect you. The current block-id based layout in 2.6.x and 2.7.2 = has >> a >>> performance regression that gets worse over time. When you see it >> happening >>> on a live cluster, it is one of the harder issues to identify a root >> cause >>> and debug. I do understand that this is currently only affecting a >> smaller >>> number of users, but I also think this number has potential to = increase >> as >>> time goes on. Maybe we can issue a warning in the release notes for >> future >>> 2.7.x and 2.6.x releases? >>>=20 >>> 3. One option (this was suggested on HDFS-8791 and I think Sean = alluded >> to >>> this proposal on this thread) would be to cut a 2.8 release off of = the >>> 2.7.3 release with the new layout. What people currently think of as = 2.8 >>> would then become 2.9. This would give customers a stable release = that >> they >>> could deploy with the new layout and would not break upgrade and >> downgrade >>> expectations. >>>=20 >>> On Fri, Apr 1, 2016 at 11:32 AM, Andrew Purtell = >> wrote: >>>=20 >>>> As a downstream consumer of Apache Hadoop 2.7.x releases, I expect = we >> would >>>> patch the release to revert HDFS-8791 before pushing it out to >> production. >>>> For what it's worth. >>>>=20 >>>>=20 >>>> On Fri, Apr 1, 2016 at 11:23 AM, Andrew Wang = >>>> wrote: >>>>=20 >>>>> One other thing I wanted to bring up regarding HDFS-8791, we = haven't >>>>> backported the parallel DN upgrade improvement (HDFS-8578) to >> branch-2.6. >>>>> HDFS-8578 is a very important related fix since otherwise upgrade = will >> be >>>>> very slow. >>>>>=20 >>>>> On Thu, Mar 31, 2016 at 10:35 AM, Andrew Wang < >> andrew.wang@cloudera.com> >>>>> wrote: >>>>>=20 >>>>>> As I expressed on HDFS-8791, I do not want to include this JIRA = in a >>>>>> maintenance release. I've only seen it crop up on a handful of = our >>>>>> customer's clusters, and large users like Twitter and Yahoo that = seem >>>> to >>>>> be >>>>>> more affected are also the most able to patch this change in >>>> themselves. >>>>>>=20 >>>>>> Layout upgrades are quite disruptive, and I don't think it's = worth >>>>>> breaking upgrade and downgrade expectations when it doesn't = affect the >>>>> (in >>>>>> my experience) vast majority of users. >>>>>>=20 >>>>>> Vinod seemed to have a similar opinion in his comment on = HDFS-8791, >> but >>>>>> will let him elaborate. >>>>>>=20 >>>>>> Best, >>>>>> Andrew >>>>>>=20 >>>>>> On Thu, Mar 31, 2016 at 9:11 AM, Sean Busbey = >>>>> wrote: >>>>>>=20 >>>>>>> As of 2 days ago, there were already 135 jiras associated with = 2.7.3, >>>>>>> if *any* of them end up introducing a regression the inclusion = of >>>>>>> HDFS-8791 means that folks will have cluster downtime in order = to >> back >>>>>>> things out. If that happens to any substantial number of = downstream >>>>>>> folks, or any particularly vocal downstream folks, then it is = very >>>>>>> likely we'll lose the remaining trust of operators for rolling = out >>>>>>> maintenance releases. That's a pretty steep cost. >>>>>>>=20 >>>>>>> Please do not include HDFS-8791 in any 2.6.z release. Folks = having to >>>>>>> be aware that an upgrade from e.g. 2.6.5 to 2.7.2 will fail is = an >>>>>>> unreasonable burden. >>>>>>>=20 >>>>>>> I agree that this fix is important, I just think we should = either cut >>>>>>> a version of 2.8 that includes it or find a way to do it that = gives >> an >>>>>>> operational path for rolling downgrade. >>>>>>>=20 >>>>>>> On Thu, Mar 31, 2016 at 10:10 AM, Junping Du = >>>>> wrote: >>>>>>>> Thanks for bringing up this topic, Sean. >>>>>>>> When I released our latest Hadoop release 2.6.4, the patch of >>>>> HDFS-8791 >>>>>>> haven't been committed in so that's why we didn't discuss this >>>> earlier. >>>>>>>> I remember in JIRA discussion, we treated this layout change as = a >>>>>>> Blocker bug that fixing a significant performance regression = before >>>> but >>>>> not >>>>>>> a normal performance improvement. And I believe HDFS community >> already >>>>> did >>>>>>> their best with careful and patient to deliver the fix and other >>>> related >>>>>>> patches (like upgrade fix in HDFS-8578). Take an example of >> HDFS-8578, >>>>> you >>>>>>> can see 30+ rounds patch review back and forth by senior = committers, >>>>> not to >>>>>>> mention the outstanding performance test data in HDFS-8791. >>>>>>>> I would trust our HDFS committers' judgement to land HDFS-8791 = on >>>>>>> 2.7.3. However, that needs Vinod's final confirmation who serves = as >> RM >>>>> for >>>>>>> branch-2.7. In addition, I didn't see any blocker issue to bring = it >>>> into >>>>>>> 2.6.5 now. >>>>>>>> Just my 2 cents. >>>>>>>>=20 >>>>>>>> Thanks, >>>>>>>>=20 >>>>>>>> Junping >>>>>>>>=20 >>>>>>>> ________________________________________ >>>>>>>> From: Sean Busbey >>>>>>>> Sent: Thursday, March 31, 2016 2:57 PM >>>>>>>> To: hdfs-dev@hadoop.apache.org >>>>>>>> Cc: Hadoop Common; yarn-dev@hadoop.apache.org; >>>>>>> mapreduce-dev@hadoop.apache.org >>>>>>>> Subject: Re: 2.7.3 release plan >>>>>>>>=20 >>>>>>>> A layout change in a maintenance release sounds very risky. I = saw >>>> some >>>>>>>> discussion on the JIRA about those risks, but the consensus = seemed >>>> to >>>>>>>> be "we'll leave it up to the 2.6 and 2.7 release managers." I >>>> thought >>>>>>>> we did RMs per release rather than per branch? No one claiming = to >>>> be a >>>>>>>> release manager ever spoke up AFAICT. >>>>>>>>=20 >>>>>>>> Should this change be included? Should it go into a special 2.8 >>>>>>>> release as mentioned in the ticket? >>>>>>>>=20 >>>>>>>> On Thu, Mar 31, 2016 at 1:45 AM, Akira AJISAKA >>>>>>>> wrote: >>>>>>>>> Thank you Vinod! >>>>>>>>>=20 >>>>>>>>> FYI: 2.7.3 will be a bit special release. >>>>>>>>>=20 >>>>>>>>> HDFS-8791 bumped up the datanode layout version, >>>>>>>>> so rolling downgrade from 2.7.3 to 2.7.[0-2] >>>>>>>>> is impossible. We can rollback instead. >>>>>>>>>=20 >>>>>>>>> https://issues.apache.org/jira/browse/HDFS-8791 >>>>>>>>>=20 >>>>>>>=20 >>>>>=20 >>>>=20 >> = https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Hdfs= RollingUpgrade.html >>>>>>>>>=20 >>>>>>>>> Regards, >>>>>>>>> Akira >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> On 3/31/16 08:18, Vinod Kumar Vavilapalli wrote: >>>>>>>>>>=20 >>>>>>>>>> Hi all, >>>>>>>>>>=20 >>>>>>>>>> Got nudged about 2.7.3. Was previously waiting for 2.6.4 to = go out >>>>>>> (which >>>>>>>>>> did go out mid February). Got a little busy since. >>>>>>>>>>=20 >>>>>>>>>> Following up the 2.7.2 maintenance release, we should work >>>> towards a >>>>>>>>>> 2.7.3. The focus obviously is to have blocker issues [1], >>>> bug-fixes >>>>>>> and *no* >>>>>>>>>> features / improvements. >>>>>>>>>>=20 >>>>>>>>>> I hope to cut an RC in a week - giving enough time for = outstanding >>>>>>> blocker >>>>>>>>>> / critical issues. Will start moving out any tickets that are = not >>>>>>> blockers >>>>>>>>>> and/or won=E2=80=99t fit the timeline - there are 3 blockers = and 15 >>>> critical >>>>>>> tickets >>>>>>>>>> outstanding as of now. >>>>>>>>>>=20 >>>>>>>>>> Thanks, >>>>>>>>>> +Vinod >>>>>>>>>>=20 >>>>>>>>>> [1] 2.7.3 release blockers: >>>>>>>>>> https://issues.apache.org/jira/issues/?filter=3D12335343 >>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> -- >>>>>>>> busbey >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> -- >>>>>>> busbey >>>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>> -- >>>> Best regards, >>>>=20 >>>> - Andy >>>>=20 >>>> Problems worthy of attack prove their worth by hitting back. - Piet = Hein >>>> (via Tom White) >>>>=20 >>=20 >>=20