Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 33A5317C40 for ; Fri, 22 May 2015 17:58:01 +0000 (UTC) Received: (qmail 44679 invoked by uid 500); 22 May 2015 17:58:00 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 44613 invoked by uid 500); 22 May 2015 17:58:00 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 44601 invoked by uid 99); 22 May 2015 17:58:00 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 May 2015 17:58:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 2262BC0098 for ; Fri, 22 May 2015 17:58:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.9 X-Spam-Level: *** X-Spam-Status: No, score=3.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id tZflM5Tled5E for ; Fri, 22 May 2015 17:57:52 +0000 (UTC) Received: from mail-pd0-f170.google.com (mail-pd0-f170.google.com [209.85.192.170]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 11F8D453EF for ; Fri, 22 May 2015 17:57:52 +0000 (UTC) Received: by pdbnk13 with SMTP id nk13so24876023pdb.1 for ; Fri, 22 May 2015 10:57:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=AO3a5d0NqHbyXcz1B/CrcmrVSJqPzlYnYGs5tbNBJwc=; b=XRSeYxxFQdBMQQ/Izrru5BjKK8WmmIF8Rmgic6Lke2tJKK5uqBN4ABiKqDN3NZlMNK VItctAi84KObl/Z645Ef3YujqQhMIeMRleQIYS02CZGslt8FRgUDSIPV0Ui81sQQazfB IWgg014w/eD3ed0bsuUiPektrj3udk/jvsMQkYh2Y4p7gsrwmT2k8L0gCl+tuCFXn5WA VKWWbEETQVvHcmjqGe1t41rHkL6O8ofSsHS0E+DGqHwZfo0fG235Grh5uk/N4tlcQ9Gz y48nHfsADhydVpTd8lof0k00Oqb0ovW6Lautx6rDGhwAmsFkFohYDfJtq01lInqJO5/b 2AHA== X-Received: by 10.66.139.167 with SMTP id qz7mr17555742pab.135.1432317425929; Fri, 22 May 2015 10:57:05 -0700 (PDT) Received: from Alan-Gatess-MacBook-Pro.local (c-76-103-170-145.hsd1.ca.comcast.net. [76.103.170.145]) by mx.google.com with ESMTPSA id dp4sm2723116pbb.82.2015.05.22.10.57.04 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 22 May 2015 10:57:05 -0700 (PDT) Message-ID: <555F6DEB.2070907@gmail.com> Date: Fri, 22 May 2015 10:56:59 -0700 From: Alan Gates User-Agent: Postbox 3.0.11 (Macintosh/20140602) MIME-Version: 1.0 To: dev@hive.apache.org Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features References: <55512F4A.2080907@gmail.com> <55567D30.1020202@gmail.com> <5556B74E.8010908@gmail.com> <687310404.202578.1432280952027.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------040703020601060406030702" --------------040703020601060406030702 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit I don't think anyone is advocating for option 2, as that would be disastrous. Option 3 is closest to what I'm proposing, though again dropping support for Hadoop 1 is only a part of it. Alan. > Alexander Pivovarov > May 22, 2015 at 10:03 > Looks like we discussing 3 options: > > 1. Support hadoop 1, 2 and 3 in master branch. > > 2. Support hadoop 1 in branch-1, hadoop 2 in branch-2, hadoop 3 in > branch-3 > > 3. Support hadoop 2 and 3 in master > > I DO not think option 2 is good solution because it is much more > difficuilt > to manage 3 active prod branches rather than one master branch. > > I think we should go with options 1 or 3. > > +1 on Xuefu and Edward opinion > > Sergey Shelukhin > May 22, 2015 at 9:08 > I think branch-2 doesn’t need to be framed as particularly adventurous > (other than due to general increase of the amount of work done in Hive by > community). > All the new features that normally go on trunk/master will go to branch-2. > branch-2 is just trunk as it is now, in fact there will be no branch-2, > just master :) The difference is the dropped functionality, not added one. > So you shouldn’t lose stability if you retain the same process as now by > just staying on versions off master. > > Perhaps, as is usually the case in Apache projects, developing features on > older branches would be discouraged. Right now, all features usually go on > trunk/master, and are then back ported as needed and practical; so you > wouldn’t (in Apache) make a feature on Hive 0.14 to be released in 0.14.N, > and not back port to master. > > > Chris Drome > May 22, 2015 at 0:49 > I understand the motivation and benefits of creating a branch-2 where > more disruptive work can go on without affecting branch-1. While not > necessarily against this approach, from Yahoo's standpoint, I do have > some questions (concerns). > Upgrading to a new version of Hive requires a significant commitment > of time and resources to stabilize and certify a build for deployment > to our clusters. Given the size of our clusters and scale of datasets, > we have to be particularly careful about adopting new functionality. > However, at the same time we are interested in new testing and making > available new features and functionality. That said, we would have to > rely on branch-1 for the immediate future. > One concern is that branch-1 would be left to stagnate, at which point > there would be no option but for users to move to branch-2 as branch-1 > would be effectively end-of-lifed. I'm not sure how long this would > take, but it would eventually happen as a direct result of the very > reason for creating branch-2. > A related concern is how disruptive the code changes will be in > branch-2. I imagine that changes in early in branch-2 will be easy to > backport to branch-1, while this effort will become more difficult, if > not impractical, as time goes. If the code bases diverge too much then > this could lead to more pressure for users of branch-1 to add features > just to branch-1, which has been mentioned as undesirable. By the same > token, backporting any code in branch-2 will require an increasing > amount of effort, which contributors to branch-2 may not be interested > in committing to. > These questions affect us directly because, while we require a certain > amount of stability, we also like to pull in new functionality that > will be of value to our users. For example, our current 0.13 release > is probably closer to 0.14 at this point. Given the lifespan of a > release, it is often more palatable to backport features and bugfixes > than to jump to a new version. > > The good thing about this proposal is the opportunity to evaluate and > clean up alot of the old code. > Thanks, > chris > > > > On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin > wrote: > > > Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some > people are set in their ways or have practical considerations and don’t > care for new shiny stuff. > > > > > > Sergey Shelukhin > May 18, 2015 at 11:47 > Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some > people are set in their ways or have practical considerations and don’t > care for new shiny stuff. > > > Sergey Shelukhin > May 18, 2015 at 11:46 > I think we need some path for deprecating old Hadoop versions, the same > way we deprecate old Java version support or old RDBMS version support. > At some point the cost of supporting Hadoop 1 exceeds the benefit. Same > goes for stuff like MR; supporting it, esp. for perf work, becomes a > burden, and it’s outdated with 2 alternatives, one of which has been > around for 2 releases. > The branches are a graceful way to get rid of the legacy burden. > > Alternatively, when sweeping changes are made, we can do what Hbase did > (which is not pretty imho), where 0.94 version had ~30 dot releases > because people cannot upgrade to 0.96 “singularity” release. > > > I posit that people who run Hadoop 1 and MR at this day and age (and more > so as time passes) are people who either don’t care about perf and new > features, only stability; so, stability-focused branch would be perfect to > support them. > > > --------------040703020601060406030702 Content-Type: multipart/related; boundary="------------040309010603030804050907" --------------040309010603030804050907 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit I don't think anyone is advocating for option 2, as that would be disastrous.  Option 3 is closest to what I'm proposing, though again dropping support for Hadoop 1 is only a part of it.

Alan.

May 22, 2015 at 10:03
Looks like we discussing 3 options:

1. Support hadoop 1, 2 and 3 in master branch.

2. Support hadoop 1 in branch-1, hadoop 2 in branch-2, hadoop 3 in branch-3

3. Support hadoop 2 and 3 in master

I DO not think option 2 is good solution because it is much more difficuilt
to manage 3 active prod branches rather than one master branch.

I think we should go with options 1 or 3.

+1 on Xuefu and Edward opinion

May 22, 2015 at 9:08
I think branch-2 doesn’t need to be framed as particularly adventurous
(other than due to general increase of the amount of work done in Hive by
community).
All the new features that normally go on trunk/master will go to branch-2.
branch-2 is just trunk as it is now, in fact there will be no branch-2,
just master :) The difference is the dropped functionality, not added one.
So you shouldn’t lose stability if you retain the same process as now by
just staying on versions off master.

Perhaps, as is usually the case in Apache projects, developing features on
older branches would be discouraged. Right now, all features usually go on
trunk/master, and are then back ported as needed and practical; so you
wouldn’t (in Apache) make a feature on Hive 0.14 to be released in 0.14.N,
and not back port to master.


May 22, 2015 at 0:49
I understand the motivation and benefits of creating a branch-2 where more disruptive work can go on without affecting branch-1. While not necessarily against this approach, from Yahoo's standpoint, I do have some questions (concerns).
Upgrading to a new version of Hive requires a significant commitment of time and resources to stabilize and certify a build for deployment to our clusters. Given the size of our clusters and scale of datasets, we have to be particularly careful about adopting new functionality. However, at the same time we are interested in new testing and making available new features and functionality. That said, we would have to rely on branch-1 for the immediate future.
One concern is that branch-1 would be left to stagnate, at which point there would be no option but for users to move to branch-2 as branch-1 would be effectively end-of-lifed. I'm not sure how long this would take, but it would eventually happen as a direct result of the very reason for creating branch-2.
A related concern is how disruptive the code changes will be in branch-2. I imagine that changes in early in branch-2 will be easy to backport to branch-1, while this effort will become more difficult, if not impractical, as time goes. If the code bases diverge too much then this could lead to more pressure for users of branch-1 to add features just to branch-1, which has been mentioned as undesirable. By the same token, backporting any code in branch-2 will require an increasing amount of effort, which contributors to branch-2 may not be interested in committing to.
These questions affect us directly because, while we require a certain amount of stability, we also like to pull in new functionality that will be of value to our users. For example, our current 0.13 release is probably closer to 0.14 at this point. Given the lifespan of a release, it is often more palatable to backport features and bugfixes than to jump to a new version.

The good thing about this proposal is the opportunity to evaluate and clean up alot of the old code.
Thanks,
chris



On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin <sergey@hortonworks.com> wrote:


Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
people are set in their ways or have practical considerations and don’t
care for new shiny stuff.





May 18, 2015 at 11:47
Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
people are set in their ways or have practical considerations and don’t
care for new shiny stuff.


May 18, 2015 at 11:46
I think we need some path for deprecating old Hadoop versions, the same
way we deprecate old Java version support or old RDBMS version support.
At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
goes for stuff like MR; supporting it, esp. for perf work, becomes a
burden, and it’s outdated with 2 alternatives, one of which has been
around for 2 releases.
The branches are a graceful way to get rid of the legacy burden.

Alternatively, when sweeping changes are made, we can do what Hbase did
(which is not pretty imho), where 0.94 version had ~30 dot releases
because people cannot upgrade to 0.96 “singularity” release.


I posit that people who run Hadoop 1 and MR at this day and age (and more
so as time passes) are people who either don’t care about perf and new
features, only stability; so, stability-focused branch would be perfect to
support them.



--------------040309010603030804050907 Content-Type: image/jpeg; x-apple-mail-type=stationery; name="compose-unknown-contact.jpg" Content-Transfer-Encoding: base64 Content-ID: Content-Disposition: inline; filename="compose-unknown-contact.jpg" /9j/4AAQSkZJRgABAQEARwBHAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEC AQEBAQEBAgICAgICAgICAgICAgICAgICAgICAgICAgICAgL/2wBDAQEBAQEBAQICAgICAgIC AgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgL/wAAR CAAZABkDAREAAhEBAxEB/8QAGAAAAwEBAAAAAAAAAAAAAAAABgcICQr/xAA0EAABAwMCAgUK BwAAAAAAAAACAQMEBQYRABITIQcUMUF2CBUXIjI2N0JRtVRWkZOV0dL/xAAYAQEAAwEAAAAA AAAAAAAAAAADAAEEAv/EACQRAAICAAQGAwAAAAAAAAAAAAABAhEDMrHREyExM0FxgfDx/9oA DAMBAAIRAxEAPwDuEt+gW/ULet6oVC3rfqNQqFv0OfPn1GhUqfOmzZtKZlS5UqZMaNwzNwiJ VIl7eXLCaZIGwBl3TY8epPx2+jy2ZNPjvkwc9uhW8j7nCPhvOsQliYIeS7cvCpp8o50qwrC4 v3lsNSDbdmTEhvs2tahxpfV3WnmbbozJEw/gwdadbYExVRXKEKoSdvJcaOSqxE7/AAiX0gXx +a69/JSf9alIlste0VzaNpeFrcT9KKymotyiaZ0KRCnzacoE7Kjzn4gi2KqUh3jqDHDHv4mR UfruTWlMzlVUKIVNp9GguEJnAh0+IZjyAiisgyRDnu5azS8miKqjOTVkKqS/psG37fo1Fbab eg25b8eZPeFJBBJSjMG5HjMeyihnaauZwe4OGiju13GAcpOwBeN+U8/IkGbsiS8b7ryogmbz hbyc9REROfZhERO5ETShjPtvpGqTUyLErytS4siSwx5x2tRH4hPOI0DkjZtaJtFxuVEbIUUi yeNujlBUJGbJN6nM/Cyf2Hf60YgjvKA+NPSP4gT7axpcPtr51YWJnYn9dnAQWl722p4ot37y zqnlfp6FrqbwawG8/9k= --------------040309010603030804050907-- --------------040703020601060406030702--