hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@apache.org>
Subject Re: 2.7.3 release plan
Date Mon, 04 Apr 2016 20:48:43 GMT
I commented on the JIRA way back (see https://issues.apache.org/jira/browse/HDFS-8791?focusedCommentId=15036666&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15036666),
saying what I said below. Unfortunately, I haven’t followed the patch along after my initial
comment. 

This isn’t about any specific release - starting 2.6 we declared support for rolling upgrades
and downgrades. Any patch that breaks this should not be in branch-2.

Two options from where I stand
 (1) For folks who worked on the patch: Is there a way to make (a) the upgrade-downgrade seamless
for people who don’t care about this (b) and have explicit documentation for people who
care to switch this behavior on and are willing to risk not having downgrades. If this means
a new configuration property, so be it. It’s a necessary evil.
 (2) Just let specific users backport this into specific 2.x branches they need and leave
it only on trunk.

Unless this behavior stops breaking rolling upgrades/downgrades, I think we should just revert
it from branch-2 and definitely 2.7.3 as it stands today.

+Vinod


> On Apr 1, 2016, at 2:54 PM, Chris Trezzo <ctrezzo@gmail.com> wrote:
> 
> A few thoughts:
> 
> 1. To echo Andrew Wang, HDFS-8578 (parallel upgrades) should be a
> prerequisite for HDFS-8791. Without that patch, upgrades can be very slow
> for data nodes depending on your setup.
> 
> 2. We have already deployed this patch internally so, with my Twitter hat
> on, I would be perfectly happy as long as it makes it into trunk and 2.8.
> That being said, I would be hesitant to deploy the current 2.7.x or 2.6.x
> releases on a large production cluster that has a diverse set of block ids
> without this patch, especially if your data nodes have a large number of
> disks or you are using federation. To be clear though: this highly depends
> on your setup and at a minimum you should verify that this regression will
> not affect you. The current block-id based layout in 2.6.x and 2.7.2 has a
> performance regression that gets worse over time. When you see it happening
> on a live cluster, it is one of the harder issues to identify a root cause
> and debug. I do understand that this is currently only affecting a smaller
> number of users, but I also think this number has potential to increase as
> time goes on. Maybe we can issue a warning in the release notes for future
> 2.7.x and 2.6.x releases?
> 
> 3. One option (this was suggested on HDFS-8791 and I think Sean alluded to
> this proposal on this thread) would be to cut a 2.8 release off of the
> 2.7.3 release with the new layout. What people currently think of as 2.8
> would then become 2.9. This would give customers a stable release that they
> could deploy with the new layout and would not break upgrade and downgrade
> expectations.
> 
> On Fri, Apr 1, 2016 at 11:32 AM, Andrew Purtell <apurtell@apache.org> wrote:
> 
>> As a downstream consumer of Apache Hadoop 2.7.x releases, I expect we would
>> patch the release to revert HDFS-8791 before pushing it out to production.
>> For what it's worth.
>> 
>> 
>> On Fri, Apr 1, 2016 at 11:23 AM, Andrew Wang <andrew.wang@cloudera.com>
>> wrote:
>> 
>>> One other thing I wanted to bring up regarding HDFS-8791, we haven't
>>> backported the parallel DN upgrade improvement (HDFS-8578) to branch-2.6.
>>> HDFS-8578 is a very important related fix since otherwise upgrade will be
>>> very slow.
>>> 
>>> On Thu, Mar 31, 2016 at 10:35 AM, Andrew Wang <andrew.wang@cloudera.com>
>>> wrote:
>>> 
>>>> As I expressed on HDFS-8791, I do not want to include this JIRA in a
>>>> maintenance release. I've only seen it crop up on a handful of our
>>>> customer's clusters, and large users like Twitter and Yahoo that seem
>> to
>>> be
>>>> more affected are also the most able to patch this change in
>> themselves.
>>>> 
>>>> Layout upgrades are quite disruptive, and I don't think it's worth
>>>> breaking upgrade and downgrade expectations when it doesn't affect the
>>> (in
>>>> my experience) vast majority of users.
>>>> 
>>>> Vinod seemed to have a similar opinion in his comment on HDFS-8791, but
>>>> will let him elaborate.
>>>> 
>>>> Best,
>>>> Andrew
>>>> 
>>>> On Thu, Mar 31, 2016 at 9:11 AM, Sean Busbey <busbey@cloudera.com>
>>> wrote:
>>>> 
>>>>> As of 2 days ago, there were already 135 jiras associated with 2.7.3,
>>>>> if *any* of them end up introducing a regression the inclusion of
>>>>> HDFS-8791 means that folks will have cluster downtime in order to back
>>>>> things out. If that happens to any substantial number of downstream
>>>>> folks, or any particularly vocal downstream folks, then it is very
>>>>> likely we'll lose the remaining trust of operators for rolling out
>>>>> maintenance releases. That's a pretty steep cost.
>>>>> 
>>>>> Please do not include HDFS-8791 in any 2.6.z release. Folks having to
>>>>> be aware that an upgrade from e.g. 2.6.5 to 2.7.2 will fail is an
>>>>> unreasonable burden.
>>>>> 
>>>>> I agree that this fix is important, I just think we should either cut
>>>>> a version of 2.8 that includes it or find a way to do it that gives an
>>>>> operational path for rolling downgrade.
>>>>> 
>>>>> On Thu, Mar 31, 2016 at 10:10 AM, Junping Du <jdu@hortonworks.com>
>>> wrote:
>>>>>> Thanks for bringing up this topic, Sean.
>>>>>> When I released our latest Hadoop release 2.6.4, the patch of
>>> HDFS-8791
>>>>> haven't been committed in so that's why we didn't discuss this
>> earlier.
>>>>>> I remember in JIRA discussion, we treated this layout change as a
>>>>> Blocker bug that fixing a significant performance regression before
>> but
>>> not
>>>>> a normal performance improvement. And I believe HDFS community already
>>> did
>>>>> their best with careful and patient to deliver the fix and other
>> related
>>>>> patches (like upgrade fix in HDFS-8578). Take an example of HDFS-8578,
>>> you
>>>>> can see 30+ rounds patch review back and forth by senior committers,
>>> not to
>>>>> mention the outstanding performance test data in HDFS-8791.
>>>>>> I would trust our HDFS committers' judgement to land HDFS-8791 on
>>>>> 2.7.3. However, that needs Vinod's final confirmation who serves as RM
>>> for
>>>>> branch-2.7. In addition, I didn't see any blocker issue to bring it
>> into
>>>>> 2.6.5 now.
>>>>>> Just my 2 cents.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Junping
>>>>>> 
>>>>>> ________________________________________
>>>>>> From: Sean Busbey <busbey@cloudera.com>
>>>>>> Sent: Thursday, March 31, 2016 2:57 PM
>>>>>> To: hdfs-dev@hadoop.apache.org
>>>>>> Cc: Hadoop Common; yarn-dev@hadoop.apache.org;
>>>>> mapreduce-dev@hadoop.apache.org
>>>>>> Subject: Re: 2.7.3 release plan
>>>>>> 
>>>>>> A layout change in a maintenance release sounds very risky. I saw
>> some
>>>>>> discussion on the JIRA about those risks, but the consensus seemed
>> to
>>>>>> be "we'll leave it up to the 2.6 and 2.7 release managers." I
>> thought
>>>>>> we did RMs per release rather than per branch? No one claiming to
>> be a
>>>>>> release manager ever spoke up AFAICT.
>>>>>> 
>>>>>> Should this change be included? Should it go into a special 2.8
>>>>>> release as mentioned in the ticket?
>>>>>> 
>>>>>> On Thu, Mar 31, 2016 at 1:45 AM, Akira AJISAKA
>>>>>> <ajisakaa@oss.nttdata.co.jp> wrote:
>>>>>>> Thank you Vinod!
>>>>>>> 
>>>>>>> FYI: 2.7.3 will be a bit special release.
>>>>>>> 
>>>>>>> HDFS-8791 bumped up the datanode layout version,
>>>>>>> so rolling downgrade from 2.7.3 to 2.7.[0-2]
>>>>>>> is impossible. We can rollback instead.
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/browse/HDFS-8791
>>>>>>> 
>>>>> 
>>> 
>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Akira
>>>>>>> 
>>>>>>> 
>>>>>>> On 3/31/16 08:18, Vinod Kumar Vavilapalli wrote:
>>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> Got nudged about 2.7.3. Was previously waiting for 2.6.4
to go out
>>>>> (which
>>>>>>>> did go out mid February). Got a little busy since.
>>>>>>>> 
>>>>>>>> Following up the 2.7.2 maintenance release, we should work
>> towards a
>>>>>>>> 2.7.3. The focus obviously is to have blocker issues [1],
>> bug-fixes
>>>>> and *no*
>>>>>>>> features / improvements.
>>>>>>>> 
>>>>>>>> I hope to cut an RC in a week - giving enough time for outstanding
>>>>> blocker
>>>>>>>> / critical issues. Will start moving out any tickets that
are not
>>>>> blockers
>>>>>>>> and/or won’t fit the timeline - there are 3 blockers and
15
>> critical
>>>>> tickets
>>>>>>>> outstanding as of now.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> +Vinod
>>>>>>>> 
>>>>>>>> [1] 2.7.3 release blockers:
>>>>>>>> https://issues.apache.org/jira/issues/?filter=12335343
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> busbey
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> busbey
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Best regards,
>> 
>>   - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>> 


Mime
View raw message