hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <>
Subject Re: [DISCUSS] Supporting Hadoop-1 and experimental features
Date Fri, 22 May 2015 17:56:59 GMT
I don't think anyone is advocating for option 2, as that would be 
disastrous.  Option 3 is closest to what I'm proposing, though again 
dropping support for Hadoop 1 is only a part of it.


> Alexander Pivovarov <>
> May 22, 2015 at 10:03
> Looks like we discussing 3 options:
> 1. Support hadoop 1, 2 and 3 in master branch.
> 2. Support hadoop 1 in branch-1, hadoop 2 in branch-2, hadoop 3 in 
> branch-3
> 3. Support hadoop 2 and 3 in master
> I DO not think option 2 is good solution because it is much more 
> difficuilt
> to manage 3 active prod branches rather than one master branch.
> I think we should go with options 1 or 3.
> +1 on Xuefu and Edward opinion
> Sergey Shelukhin <>
> May 22, 2015 at 9:08
> I think branch-2 doesn’t need to be framed as particularly adventurous
> (other than due to general increase of the amount of work done in Hive by
> community).
> All the new features that normally go on trunk/master will go to branch-2.
> branch-2 is just trunk as it is now, in fact there will be no branch-2,
> just master :) The difference is the dropped functionality, not added one.
> So you shouldn’t lose stability if you retain the same process as now by
> just staying on versions off master.
> Perhaps, as is usually the case in Apache projects, developing features on
> older branches would be discouraged. Right now, all features usually go on
> trunk/master, and are then back ported as needed and practical; so you
> wouldn’t (in Apache) make a feature on Hive 0.14 to be released in 0.14.N,
> and not back port to master.
> Chris Drome <>
> May 22, 2015 at 0:49
> I understand the motivation and benefits of creating a branch-2 where 
> more disruptive work can go on without affecting branch-1. While not 
> necessarily against this approach, from Yahoo's standpoint, I do have 
> some questions (concerns).
> Upgrading to a new version of Hive requires a significant commitment 
> of time and resources to stabilize and certify a build for deployment 
> to our clusters. Given the size of our clusters and scale of datasets, 
> we have to be particularly careful about adopting new functionality. 
> However, at the same time we are interested in new testing and making 
> available new features and functionality. That said, we would have to 
> rely on branch-1 for the immediate future.
> One concern is that branch-1 would be left to stagnate, at which point 
> there would be no option but for users to move to branch-2 as branch-1 
> would be effectively end-of-lifed. I'm not sure how long this would 
> take, but it would eventually happen as a direct result of the very 
> reason for creating branch-2.
> A related concern is how disruptive the code changes will be in 
> branch-2. I imagine that changes in early in branch-2 will be easy to 
> backport to branch-1, while this effort will become more difficult, if 
> not impractical, as time goes. If the code bases diverge too much then 
> this could lead to more pressure for users of branch-1 to add features 
> just to branch-1, which has been mentioned as undesirable. By the same 
> token, backporting any code in branch-2 will require an increasing 
> amount of effort, which contributors to branch-2 may not be interested 
> in committing to.
> These questions affect us directly because, while we require a certain 
> amount of stability, we also like to pull in new functionality that 
> will be of value to our users. For example, our current 0.13 release 
> is probably closer to 0.14 at this point. Given the lifespan of a 
> release, it is often more palatable to backport features and bugfixes 
> than to jump to a new version.
> The good thing about this proposal is the opportunity to evaluate and 
> clean up alot of the old code.
> Thanks,
> chris
> On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin 
> <> wrote:
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
> Sergey Shelukhin <>
> May 18, 2015 at 11:47
> Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
> people are set in their ways or have practical considerations and don’t
> care for new shiny stuff.
> Sergey Shelukhin <>
> May 18, 2015 at 11:46
> I think we need some path for deprecating old Hadoop versions, the same
> way we deprecate old Java version support or old RDBMS version support.
> At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
> goes for stuff like MR; supporting it, esp. for perf work, becomes a
> burden, and it’s outdated with 2 alternatives, one of which has been
> around for 2 releases.
> The branches are a graceful way to get rid of the legacy burden.
> Alternatively, when sweeping changes are made, we can do what Hbase did
> (which is not pretty imho), where 0.94 version had ~30 dot releases
> because people cannot upgrade to 0.96 “singularity” release.
> I posit that people who run Hadoop 1 and MR at this day and age (and more
> so as time passes) are people who either don’t care about perf and new
> features, only stability; so, stability-focused branch would be perfect to
> support them.

View raw message