hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuefu Zhang <xzh...@cloudera.com>
Subject Re: [DISCUSS] Deprecating Hive CLI
Date Fri, 01 May 2015 01:15:04 GMT
Okay. That's fine. I think supporting an env variable doesn't take much.
What about enabling the new code path by default, and allowing user to
opt-out or in case of a serious bug? We also give user an warning that the
env variable may be discontinued in the future.

thanks,
Xuefu

On Thu, Apr 30, 2015 at 5:13 PM, Thejas Nair <thejas.nair@gmail.com> wrote:

> In most cases with hive, when a major implementation change is made,
> we usually provide the user to fallback to older implementation. For
> example, when CBO was added, it was initially not enabled by default,
> and there still option of using non-CBO path. When new hadoop major
> versions are added, we still give users option of using older hadoop
> versions for some time. Or in case of jdbc, we allowed users to choose
> between HiveServer1 and 2 for sometime. Even with putting good effort
> into testing, some corner cases sometimes get missed.
>
> On similar lines, it would be good to let opt-in for a release, and
> then switch the default in the next release. Given that we have been
> making new releases of hive every few months, I don't see this as a
> big issue. I think we should at the minimum allow users to opt out of
> new implementation for a release or so (if they encounter bugs).
>
> Most of the work is going to be in ensuring the compatibility.
> Supporting a flag to choose implementation should be relatively
> simpler work. What do you think ?
>
>
>
>
>
>
>
>
>
> On Thu, Apr 30, 2015 at 4:42 PM, Xuefu Zhang <xzhang@cloudera.com> wrote:
> > Hi Thejas,
> >
> > Thanks for your input. I thought about this, but I don't really feel it
> > necessary to have a "transition" stage. After all, Hive CLI is a command
> > line tool with well-defined command line options. That's the "interface"
> > that we need to support. We are just changing the implementation. Through
> > comprehensive testing, we hope to discover most of the issues.
> >
> > On the other hand, if we have such an transition, there might never be a
> > user bothering to flip the env variable and the transition doesn't really
> > build up more confidence.
> >
> > In addition, if we provide either a transition or switch for every
> > implementation change, wouldn't users be overwhelmed by those transitions
> > or switches.
> >
> > Thoughts?
> >
> > Thanks,
> > Xuefu
> >
> > On Thu, Apr 30, 2015 at 3:10 PM, Thejas Nair <thejas.nair@gmail.com>
> wrote:
> >
> >> Hi Xuefu,
> >> What is the plan you have in mind for a transition to using beeline
> >> from within hive?
> >> I assume there is going to be some translation from hive cli options
> >> and commands to beeline. Is that right ?
> >> Once the translation is in place, how would the switch happen ?
> >>
> >> I am thinking that once there is a hive-cli compatible beeline mode,
> >> there can be an option to switch between beeline and hive cli codebase
> >> .
> >> For example,
> >> In hive version X , when an environment variable CLI_USE_BEELINE=true
> >> environment variable is set, "hive" command uses beeline underneath
> >> (default remains cli codepath, so that users can start experimenting
> >> with "hive" commands beeline mode).
> >> In hive version Y > X, by default "hive" command starts using beeline
> >> underneath.
> >>
> >> Is it something like this what you have in mind ?
> >>
> >> Thanks,
> >> Thejas
> >>
> >>
> >>
> >> On Mon, Apr 27, 2015 at 5:31 PM, Xuefu Zhang <xzhang@cloudera.com>
> wrote:
> >> > FYI, I have created an uber JIRA for this:
> >> > https://issues.apache.org/jira/browse/HIVE-10511.
> >> >
> >> > Thanks,
> >> > Xuefu
> >> >
> >> > On Mon, Apr 27, 2015 at 4:54 PM, Xuefu Zhang <xzhang@cloudera.com>
> >> wrote:
> >> >
> >> >> Yes, Olga. I  will create JIRAs to track those.
> >> >>
> >> >> Thanks,
> >> >> Xuefu
> >> >>
> >> >> On Mon, Apr 27, 2015 at 4:51 PM, Olga L. Natkovich <
> >> >> olgan@yahoo-inc.com.invalid> wrote:
> >> >>
> >> >>> We would need to build a test suite that makes sure that new
> >> >>> implementation is compatible with the old one for users to adopt
> it. We
> >> >>> would also need some benchmarks to compare performance. Could you
> >> please
> >> >>> include this in the proposal as well.
> >> >>> Thanks,
> >> >>> Olga
> >> >>>       From: Xuefu Zhang <xzhang@cloudera.com>
> >> >>>  To: "dev@hive.apache.org" <dev@hive.apache.org>
> >> >>>  Sent: Monday, April 27, 2015 4:46 PM
> >> >>>  Subject: Re: [DISCUSS] Deprecating Hive CLI
> >> >>>
> >> >>> Existing implementation of Hive CLI will be replaced, so that Hive
> >> >>> community don't need to maintain two code paths for the same thing.
> >> That's
> >> >>> basically what option #2 provides.
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Mon, Apr 27, 2015 at 4:01 PM, Alexander Pivovarov <
> >> >>> apivovarov@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>> > Does it mean that existing Hive CLI will be killed?
> >> >>> >
> >> >>> > On Mon, Apr 27, 2015 at 3:46 PM, Xuefu Zhang <xzhang@cloudera.com
> >
> >> >>> wrote:
> >> >>> >
> >> >>> > > To be precise, the proposal is NOT deprecating, but more
of
> >> changing
> >> >>> the
> >> >>> > > implementation of the Hive CLI using beeline, which seems
in
> >> >>> consensus.
> >> >>> > >
> >> >>> > > On Mon, Apr 27, 2015 at 2:47 PM, Alexander Pivovarov
<
> >> >>> > apivovarov@gmail.com
> >> >>> > > >
> >> >>> > > wrote:
> >> >>> > >
> >> >>> > > > I just started the survey on Deprecating Hive CLI.
Please
> share
> >> you
> >> >>> > > > opinion.
> >> >>> > > >
> >> >>> > > > Deprecating Hive CLI:
> >> >>> > > > https://www.surveymonkey.com/s/XFHLM57
> >> >>> > > >
> >> >>> > > > Results:
> >> >>> > > > https://www.surveymonkey.com/results/SM-JHYY5DR9/
> >> >>> > > >
> >> >>> > > >
> >> >>> > > > On Mon, Apr 27, 2015 at 2:23 PM, Alexander Pivovarov
<
> >> >>> > > apivovarov@gmail.com
> >> >>> > > > >
> >> >>> > > > wrote:
> >> >>> > > >
> >> >>> > > > > Xuefu,
> >> >>> > > > >
> >> >>> > > > > I'm just saying that most of the shells (e.g.
mysql or
> >> accumulo)
> >> >>> > > reserve
> >> >>> > > > > -u for user.
> >> >>> > > > >
> >> >>> > > > > I believe lots of stuff in Hive take MySQL
as an example.
> >> >>> > > > >
> >> >>> > > > > Alex
> >> >>> > > > >
> >> >>> > > > >
> >> >>> > > > > On Mon, Apr 27, 2015 at 2:14 PM, Xuefu Zhang
<
> >> xzhang@cloudera.com
> >> >>> >
> >> >>> > > > wrote:
> >> >>> > > > >
> >> >>> > > > >> Alex,
> >> >>> > > > >>
> >> >>> > > > >> Just to be sure, we are talking about replace
Hive CLI, not
> >> mysql
> >> >>> > and
> >> >>> > > > >> accumulo command line shells. Thus, I'm
not sure this is
> >> >>> relavent.
> >> >>> > > > >> Regardless, I think we'd better have some
writeup in the
> >> proposed
> >> >>> > uber
> >> >>> > > > >> JIRA
> >> >>> > > > >> so that everyone knows what we are signing
up.
> >> >>> > > > >>
> >> >>> > > > >> Thanks,
> >> >>> > > > >> Xuefu
> >> >>> > > > >>
> >> >>> > > > >> On Mon, Apr 27, 2015 at 12:57 PM, Alexander
Pivovarov <
> >> >>> > > > >> apivovarov@gmail.com>
> >> >>> > > > >> wrote:
> >> >>> > > > >>
> >> >>> > > > >> > Mysql and accumulo command line shells
use -u to pass
> <user>
> >> >>> > > > >> >
> >> >>> > > > >> > Can beeline use -u as well? Currently
-u is reserved for
> >> URL?
> >> >>> > > > >> > On Apr 27, 2015 12:42 PM, "Xuefu Zhang"
<
> >> xzhang@cloudera.com>
> >> >>> > > wrote:
> >> >>> > > > >> >
> >> >>> > > > >> > > Thanks to all for the input.
I assume that we have a
> >> >>> consensus
> >> >>> > > that
> >> >>> > > > >> we'd
> >> >>> > > > >> > > like to keep Hive as an alias
to beeline with embedded
> HS2
> >> >>> and
> >> >>> > > make
> >> >>> > > > >> user
> >> >>> > > > >> > > transition as smooth as possible
by identifying gaps
> and
> >> >>> fixing
> >> >>> > > > >> issues.
> >> >>> > > > >> > I'm
> >> >>> > > > >> > > going to create an umbrella JIRA
and subtasks to track
> the
> >> >>> > > progress.
> >> >>> > > > >> > Please
> >> >>> > > > >> > > let me know if you have further
questions.
> >> >>> > > > >> > >
> >> >>> > > > >> > > Thanks,
> >> >>> > > > >> > > Xuefu
> >> >>> > > > >> > >
> >> >>> > > > >> > > On Sat, Apr 25, 2015 at 12:59
AM, Lars Francke <
> >> >>> > > > >> lars.francke@gmail.com>
> >> >>> > > > >> > > wrote:
> >> >>> > > > >> > >
> >> >>> > > > >> > > > Yes, well put. It is about
usability and "least
> >> surprise".
> >> >>> > > > >> > > >
> >> >>> > > > >> > > > So if people wouldn't have
to deal with JDBC syntax
> by
> >> >>> default
> >> >>> > > and
> >> >>> > > > >> > could
> >> >>> > > > >> > > > use "hive" instead of "beeline"
to start that'd be
> good.
> >> >>> > > > >> > > >
> >> >>> > > > >> > > >
> >> >>> > > > >> > > > On Sat, Apr 25, 2015 at
12:38 AM, Alan Gates <
> >> >>> > > > alanfgates@gmail.com>
> >> >>> > > > >> > > wrote:
> >> >>> > > > >> > > >
> >> >>> > > > >> > > >> If I understand correctly
this is an argument about
> >> >>> > usability,
> >> >>> > > > not
> >> >>> > > > >> > > >> functionality.  So if
Hive still had the CLI but it
> >> >>> happened
> >> >>> > to
> >> >>> > > > use
> >> >>> > > > >> > > either
> >> >>> > > > >> > > >> HS2 or embedded HS2
(depending on configuration)
> >> >>> underneath
> >> >>> > > your
> >> >>> > > > >> > > concerns
> >> >>> > > > >> > > >> would be addressed.
 Is that correct?
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >> Alan.
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >>  Lars Francke <lars.francke@gmail.com>
> >> >>> > > > >> > > >>  April 23, 2015 at 15:53
> >> >>> > > > >> > > >> I've been at about 20
different customers in the
> years
> >> >>> since
> >> >>> > > > >> Beeline
> >> >>> > > > >> > has
> >> >>> > > > >> > > >> been added. I can only
think of a single one that
> has
> >> used
> >> >>> > > > beeline.
> >> >>> > > > >> > The
> >> >>> > > > >> > > >> instinct is to use "hive",
partially because it is
> >> easy to
> >> >>> > > > remember
> >> >>> > > > >> > and
> >> >>> > > > >> > > >> intuitive and because
it is easier to use. I end up
> >> >>> googling
> >> >>> > > the
> >> >>> > > > >> > stupid
> >> >>> > > > >> > > >> JDBC syntax every single
time.
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >> I know this might be
a bit "out there" but I propose
> >> >>> > something
> >> >>> > > > >> else:
> >> >>> > > > >> > > >> 1) Rename (or link)
"beeline" to "hive"
> >> >>> > > > >> > > >> 2) Add a "--hiveserver2"
(or "--jdbc" or
> "--beeline")
> >> >>> option
> >> >>> > to
> >> >>> > > > the
> >> >>> > > > >> > > >> "hive" command to get
the current "beeline", this'd
> >> keep
> >> >>> the
> >> >>> > > CLI
> >> >>> > > > as
> >> >>> > > > >> > > >> default, we could also
add a "--legacy" or "--cli"
> >> option
> >> >>> and
> >> >>> > > > make
> >> >>> > > > >> > > >> "hiveserver2/beeline"
the default.
> >> >>> > > > >> > > >> 3) Add a "--embedded-hs2"
option to the "hive"
> command
> >> to
> >> >>> get
> >> >>> > > an
> >> >>> > > > >> > > embedded
> >> >>> > > > >> > > >> HS2 in Beeline
> >> >>> > > > >> > > >> 4) Add some documentation
to beeline reminding
> people
> >> on
> >> >>> > > startup
> >> >>> > > > of
> >> >>> > > > >> > > >> beeline on how to connect
and how to use embedded
> mode
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >> The fact is that the
old shell just works for lots
> of
> >> >>> people
> >> >>> > > and
> >> >>> > > > >> > there's
> >> >>> > > > >> > > >> just no need for beeline
for these people. Also the
> >> name
> >> >>> is
> >> >>> > > > >> confusing
> >> >>> > > > >> > -
> >> >>> > > > >> > > >> especially for non-native
speakers. It's not a
> common
> >> >>> word so
> >> >>> > > > it's
> >> >>> > > > >> not
> >> >>> > > > >> > > easy
> >> >>> > > > >> > > >> to remember.
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >>  Alan Gates <alanfgates@gmail.com>
> >> >>> > > > >> > > >>  April 23, 2015 at 15:35
> >> >>> > > > >> > > >>  Xuefu, thanks for getting
this discussion started.
> >> >>> Limiting
> >> >>> > > our
> >> >>> > > > >> code
> >> >>> > > > >> > > >> paths is definitely
a plus.  My inclination would be
> >> to go
> >> >>> > > > towards
> >> >>> > > > >> > > option
> >> >>> > > > >> > > >> 2.  A few questions:
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >> 1) Is there any functionality
in CLI that's not in
> >> >>> beeline?
> >> >>> > > > >> > > >> 2) If I understand correctly
option 2 would have an
> >> >>> implicit
> >> >>> > > HS2
> >> >>> > > > in
> >> >>> > > > >> > > >> process when a user
runs the CLI.  Would this be
> >> >>> available in
> >> >>> > > > >> option 1
> >> >>> > > > >> > > as
> >> >>> > > > >> > > >> well?
> >> >>> > > > >> > > >> 3) Are there any performance
implications, since now
> >> >>> commands
> >> >>> > > > have
> >> >>> > > > >> to
> >> >>> > > > >> > > hop
> >> >>> > > > >> > > >> through a thrift/jdbc
loop even in the embedded
> mode?
> >> >>> > > > >> > > >> 4) If we choose option
2 how backward compatible
> can we
> >> >>> make
> >> >>> > > it?
> >> >>> > > > >> Will
> >> >>> > > > >> > > >> users need to change
any scripts they have that use
> the
> >> >>> CLI?
> >> >>> > > Do
> >> >>> > > > we
> >> >>> > > > >> > have
> >> >>> > > > >> > > >> tests that will make
sure of this?
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >> Alan.
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >>  Xuefu Zhang <xzhang@cloudera.com>
> >> >>> > > > >> > > >>  April 23, 2015 at 14:43
> >> >>> > > > >> > > >> Hi all,
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >> I'd like to revive the
discussion about the fate of
> >> Hive
> >> >>> CLI,
> >> >>> > > as
> >> >>> > > > >> this
> >> >>> > > > >> > > >> topic
> >> >>> > > > >> > > >> has haunted us several
times including [1][2]. It
> looks
> >> >>> to me
> >> >>> > > > that
> >> >>> > > > >> > there
> >> >>> > > > >> > > >> is
> >> >>> > > > >> > > >> a consensus that it's
not wise for Hive community to
> >> keep
> >> >>> > both
> >> >>> > > > Hive
> >> >>> > > > >> > CLI
> >> >>> > > > >> > > as
> >> >>> > > > >> > > >> it is as well as Beeline
+ HS2. However, I don't
> >> believe
> >> >>> that
> >> >>> > > no
> >> >>> > > > >> > action
> >> >>> > > > >> > > is
> >> >>> > > > >> > > >> the best action for
us. From discussion so far, I
> see
> >> the
> >> >>> > > > following
> >> >>> > > > >> > > >> proposals:
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >> 1. Deprecating Hive
CLI and advise that users use
> >> Beeline.
> >> >>> > > > >> > > >> 2. Make Hive CLI as
naming flavor to beeline with
> >> embedded
> >> >>> > > mode.
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >> Frankly, I don't see
much difference between the two
> >> >>> > > approaches.
> >> >>> > > > >> > Keeping
> >> >>> > > > >> > > >> an
> >> >>> > > > >> > > >> alias at script or even
code level isn't that much
> >> work.
> >> >>> > > However,
> >> >>> > > > >> > > >> shouldn't
> >> >>> > > > >> > > >> we pick a direction
and start moving to it? If
> there is
> >> >>> any
> >> >>> > > gaps
> >> >>> > > > >> > between
> >> >>> > > > >> > > >> beeline embedded and
Hive CLI, we should identify
> and
> >> >>> fill in
> >> >>> > > > >> those.
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >> I'd love to hear the
thoughts from the community and
> >> hope
> >> >>> > this
> >> >>> > > > >> time we
> >> >>> > > > >> > > >> will
> >> >>> > > > >> > > >> have concrete action
items to work on.
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >> Thanks,
> >> >>> > > > >> > > >> Xuefu
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >> [1]
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >>
> >> >>> > > > >> > >
> >> >>> > > > >> >
> >> >>> > > > >>
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://mail-archives.apache.org/mod_mbox/hive-dev/201412.mbox/%3C5485E1BE.3060709%40hortonworks.com%3E
> >> >>> > > > >> > > >> [2]
> >> >>> > > > >>
> >> https://www.mail-archive.com/dev@hive.apache.org/msg112378.html
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >>
> >> >>> > > > >> > > >
> >> >>> > > > >> > >
> >> >>> > > > >> >
> >> >>> > > > >>
> >> >>> > > > >
> >> >>> > > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message