hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Pena <sergio.p...@cloudera.com>
Subject Re: Backward incompatible changes
Date Thu, 09 Mar 2017 20:01:32 GMT
Hey Ashutosh, thanks for soliciting feedback on this.

I like the idea you're proposing; maintaining compatibility and at the
same time adding newer features to
Hive consumes a lot of development time and effort.

However, I think some users and companies have just started to use Hive 2.x
branch as their main major upgrade on Hive
(possible due to waiting for stabilization and testing upgrades), but
cutting this major branch that just has 1 year of life
might make us look like we will forget about the quality of Hive 2.x as we
did with branch-1.

Hive 1.x latest version was 1.2, and its development stopped because new
features on Hive 2.x
Hive 2.x latest version is 2.1, and we want to create Hive 3.x because of
newer features and incompatibilities.
Will Hive 3.x have the same future after 3.1 is released?

What I'm also concerned is about these three things:

   - *Branch-2 quality commitment*.
   How will we keep the community motivated on fixing both master and
   branch-2?
   - *Harder cherry-picks between master and branch-2*.
   Because master will be incompatible by nature, then cherry-picks to
   branch-2 will be harder.
   - *Removal of MR2 on the master branch*.
   This was marked as deprecated just last year, but MR2 is still an engine
   that is used by several users.

I accept that the end of life of major versions will come at some point,
and these concerns will expire,
but Hive 2.x is kind of young, isn't it?

Should we try to stabilize the Hive 2.x line first, and have a few more
releases before starting to work on Hive 3.0?
Should we add more test coverage to Hive jenkins jobs to validate Hive 2.x
quality?
Should we agree on a date about when we should drop community support on
Hive versions to let users know about this?

Again, I like your proposal, but I'm afraid that users who just upgraded to
2.x won't have any more features and improvements
because they will be developed on 3.0.

- Sergio



On Mon, Mar 6, 2017 at 1:24 PM, Ashutosh Chauhan <ashutosh.chauhan@gmail.com
> wrote:

> The way it helps shedding debt  is because dev can now do refactoring
> without fear of breaking some rarely used features. The way that helps for
> adding feature faster is since codebase is lean and easier to reason about
> its much easier to add new features.
>
> More importantly though, it also helps users because we are setting the
> expectation from dev community. They can expect that future releases of 2.x
> to be backward compatible. At the same time whenever they decide to upgrade
> they only need to test their application once against 3.x as oppose to
> continuous breakage of one form or another if we continue to make
> incompatible changes in master without branching for 2.x
>
> Thanks,
> Ashutosh
>
> On Sat, Mar 4, 2017 at 10:19 AM, Edward Capriolo <edlinuxguru@gmail.com>
> wrote:
>
> > Also i dont follow how we remove
> >
> > On Saturday, March 4, 2017, Edward Capriolo <edlinuxguru@gmail.com>
> wrote:
> >
> > >
> > >
> > > On Fri, Mar 3, 2017 at 8:46 PM, Thejas Nair <thejas.nair@gmail.com
> > > <javascript:_e(%7B%7D,'cvml','thejas.nair@gmail.com');>> wrote:
> > >
> > >> +1
> > >> There are some features that are incomplete and what I would not
> > recommend
> > >> for any real production use.The 'legacy authorization mode' is a great
> > >> example of that -
> > >> https://cwiki.apache.org/confluence/display/Hive/Hive+Defaul
> > >> t+Authorization+-+Legacy+Mode
> > >> . It is inherently insecure mode that nobody should be using.
> > >>
> > >> There is also potential to cleanup of the thrift api. However, there
> are
> > >> many users of this api, we would need to go the deprecation then
> remove
> > >> after couple of releases route or so for that.
> > >>
> > >> I am sure there are many other candidates. We will have to evaluate
> each
> > >> of
> > >> those features on the risk/benefit of keeping them and arriving at a
> > >> decision.
> > >>
> > >> Also, +1 on getting a 2.2 release out before we branch.
> > >>
> > >>
> > >>
> > >> On Fri, Mar 3, 2017 at 1:50 PM, Ashutosh Chauhan <
> hashutosh@apache.org
> > >> <javascript:_e(%7B%7D,'cvml','hashutosh@apache.org');>>
> > >> wrote:
> > >>
> > >> > Hi all,
> > >> >
> > >> > Hive project has come a long way. With wide-spread adoption also
> comes
> > >> > expectations. Expectation of being backward compatible and not
> > breaking
> > >> > things. However that doesn't come free of cost and results in lot
of
> > >> legacy
> > >> > code which can't be refactored without fear of breaking things. As
a
> > >> result
> > >> > project has accumulated lot of debt over time. At the same time
> there
> > >> are
> > >> > also lot of features which have seen little uptake. We may want to
> > drop
> > >> > some of those.
> > >> >
> > >> > In order to move forward and shed that debt we may need a major
> > version
> > >> > release which allows us to make backward incompatible changes and
> drop
> > >> > rarely used features. At the same time there are lots of users which
> > are
> > >> > consuming currently released 2.1 , 2.2 branches and expect them to
> > stay
> > >> on
> > >> > it for some time. So, I propose that we create branch-2 from current
> > tip
> > >> > and do future 2.x releases from that branch and keep it backward
> > >> > compatible. This will allow devs to land breaking changes on master
> > and
> > >> > pave way to release hive 3.0 in future.
> > >> >
> > >> > Ofcourse, each specific incompatible change and feature drop  even
> on
> > >> > master need to be evaluated on its own merit on corresponding jira.
> > This
> > >> > email is just a solicitation of feedback for creating branch-2 and
> > >> allowing
> > >> > breaking changes in master. Thoughts?
> > >> >
> > >> > Thanks,
> > >> > Ashutosh
> > >> >
> > >>
> > >
> > > One of the challenges of the developers conducting the risk-benefit
> > > analysis are that the developers are mostly focused on new features,
> but
> > > there are deployments of hive that are 5+ years old and people that
> rely
> > on
> > > the features are not on the mailing list.
> > >
> > > For example I developed and use this frequently:
> > >
> > > https://community.hortonworks.com/articles/8861/apache-hive-
> > > groovy-udf-examples.html
> > >
> > > My career went away from hive for a while. I was quite surprised to
> find
> > > out the cli->beeline it was more or less decided not to port it. I
> > learned
> > > of this the first time I was forced to work in a hive server only
> > > environment and it did not work.
> > >
> > > Now I have to go and spend time adding this back so I don't have to
> work
> > > around it not being there.
> > >
> > > What we should do continue/doing is making code that is modular we need
> > to
> > > break hard dependencies like ThriftSerde or OrcSerde being "native" and
> > > having to be linked to the metastore move them out into proper
> > submodules.
> > > There is too much code that only works for one implementation of a
> serde
> > > etc.
> > >
> > >
> > >
> >
> > I would like a timeline to understand this. It sounds as if master is not
> > releasable currently, so already broken in a way. We make a branch and
> > aggreasively break it more?
> >
> > Im not following what makes this branching policy makes adding features
> > faster or how it helps shed debt faster.
> >
> >
> > --
> > Sorry this was sent from mobile. Will do less grammar and spell check
> than
> > usual.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message