hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thejas Nair <thejas.n...@gmail.com>
Subject Re: [DISCUSS] Deprecating Hive CLI
Date Fri, 01 May 2015 18:04:02 GMT
That sounds fine to me.  My main concern is was that we should allow
users to switch back if they encounter some corner case bugs, for at
least a release or two.
Yes, we can add that warning as well.


On Thu, Apr 30, 2015 at 6:15 PM, Xuefu Zhang <xzhang@cloudera.com> wrote:
> Okay. That's fine. I think supporting an env variable doesn't take much.
> What about enabling the new code path by default, and allowing user to
> opt-out or in case of a serious bug? We also give user an warning that the
> env variable may be discontinued in the future.
>
> thanks,
> Xuefu
>
> On Thu, Apr 30, 2015 at 5:13 PM, Thejas Nair <thejas.nair@gmail.com> wrote:
>
>> In most cases with hive, when a major implementation change is made,
>> we usually provide the user to fallback to older implementation. For
>> example, when CBO was added, it was initially not enabled by default,
>> and there still option of using non-CBO path. When new hadoop major
>> versions are added, we still give users option of using older hadoop
>> versions for some time. Or in case of jdbc, we allowed users to choose
>> between HiveServer1 and 2 for sometime. Even with putting good effort
>> into testing, some corner cases sometimes get missed.
>>
>> On similar lines, it would be good to let opt-in for a release, and
>> then switch the default in the next release. Given that we have been
>> making new releases of hive every few months, I don't see this as a
>> big issue. I think we should at the minimum allow users to opt out of
>> new implementation for a release or so (if they encounter bugs).
>>
>> Most of the work is going to be in ensuring the compatibility.
>> Supporting a flag to choose implementation should be relatively
>> simpler work. What do you think ?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Apr 30, 2015 at 4:42 PM, Xuefu Zhang <xzhang@cloudera.com> wrote:
>> > Hi Thejas,
>> >
>> > Thanks for your input. I thought about this, but I don't really feel it
>> > necessary to have a "transition" stage. After all, Hive CLI is a command
>> > line tool with well-defined command line options. That's the "interface"
>> > that we need to support. We are just changing the implementation. Through
>> > comprehensive testing, we hope to discover most of the issues.
>> >
>> > On the other hand, if we have such an transition, there might never be a
>> > user bothering to flip the env variable and the transition doesn't really
>> > build up more confidence.
>> >
>> > In addition, if we provide either a transition or switch for every
>> > implementation change, wouldn't users be overwhelmed by those transitions
>> > or switches.
>> >
>> > Thoughts?
>> >
>> > Thanks,
>> > Xuefu
>> >
>> > On Thu, Apr 30, 2015 at 3:10 PM, Thejas Nair <thejas.nair@gmail.com>
>> wrote:
>> >
>> >> Hi Xuefu,
>> >> What is the plan you have in mind for a transition to using beeline
>> >> from within hive?
>> >> I assume there is going to be some translation from hive cli options
>> >> and commands to beeline. Is that right ?
>> >> Once the translation is in place, how would the switch happen ?
>> >>
>> >> I am thinking that once there is a hive-cli compatible beeline mode,
>> >> there can be an option to switch between beeline and hive cli codebase
>> >> .
>> >> For example,
>> >> In hive version X , when an environment variable CLI_USE_BEELINE=true
>> >> environment variable is set, "hive" command uses beeline underneath
>> >> (default remains cli codepath, so that users can start experimenting
>> >> with "hive" commands beeline mode).
>> >> In hive version Y > X, by default "hive" command starts using beeline
>> >> underneath.
>> >>
>> >> Is it something like this what you have in mind ?
>> >>
>> >> Thanks,
>> >> Thejas
>> >>
>> >>
>> >>
>> >> On Mon, Apr 27, 2015 at 5:31 PM, Xuefu Zhang <xzhang@cloudera.com>
>> wrote:
>> >> > FYI, I have created an uber JIRA for this:
>> >> > https://issues.apache.org/jira/browse/HIVE-10511.
>> >> >
>> >> > Thanks,
>> >> > Xuefu
>> >> >
>> >> > On Mon, Apr 27, 2015 at 4:54 PM, Xuefu Zhang <xzhang@cloudera.com>
>> >> wrote:
>> >> >
>> >> >> Yes, Olga. I  will create JIRAs to track those.
>> >> >>
>> >> >> Thanks,
>> >> >> Xuefu
>> >> >>
>> >> >> On Mon, Apr 27, 2015 at 4:51 PM, Olga L. Natkovich <
>> >> >> olgan@yahoo-inc.com.invalid> wrote:
>> >> >>
>> >> >>> We would need to build a test suite that makes sure that new
>> >> >>> implementation is compatible with the old one for users to
adopt
>> it. We
>> >> >>> would also need some benchmarks to compare performance. Could
you
>> >> please
>> >> >>> include this in the proposal as well.
>> >> >>> Thanks,
>> >> >>> Olga
>> >> >>>       From: Xuefu Zhang <xzhang@cloudera.com>
>> >> >>>  To: "dev@hive.apache.org" <dev@hive.apache.org>
>> >> >>>  Sent: Monday, April 27, 2015 4:46 PM
>> >> >>>  Subject: Re: [DISCUSS] Deprecating Hive CLI
>> >> >>>
>> >> >>> Existing implementation of Hive CLI will be replaced, so that
Hive
>> >> >>> community don't need to maintain two code paths for the same
thing.
>> >> That's
>> >> >>> basically what option #2 provides.
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Mon, Apr 27, 2015 at 4:01 PM, Alexander Pivovarov <
>> >> >>> apivovarov@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>> > Does it mean that existing Hive CLI will be killed?
>> >> >>> >
>> >> >>> > On Mon, Apr 27, 2015 at 3:46 PM, Xuefu Zhang <xzhang@cloudera.com
>> >
>> >> >>> wrote:
>> >> >>> >
>> >> >>> > > To be precise, the proposal is NOT deprecating, but
more of
>> >> changing
>> >> >>> the
>> >> >>> > > implementation of the Hive CLI using beeline, which
seems in
>> >> >>> consensus.
>> >> >>> > >
>> >> >>> > > On Mon, Apr 27, 2015 at 2:47 PM, Alexander Pivovarov
<
>> >> >>> > apivovarov@gmail.com
>> >> >>> > > >
>> >> >>> > > wrote:
>> >> >>> > >
>> >> >>> > > > I just started the survey on Deprecating Hive
CLI. Please
>> share
>> >> you
>> >> >>> > > > opinion.
>> >> >>> > > >
>> >> >>> > > > Deprecating Hive CLI:
>> >> >>> > > > https://www.surveymonkey.com/s/XFHLM57
>> >> >>> > > >
>> >> >>> > > > Results:
>> >> >>> > > > https://www.surveymonkey.com/results/SM-JHYY5DR9/
>> >> >>> > > >
>> >> >>> > > >
>> >> >>> > > > On Mon, Apr 27, 2015 at 2:23 PM, Alexander Pivovarov
<
>> >> >>> > > apivovarov@gmail.com
>> >> >>> > > > >
>> >> >>> > > > wrote:
>> >> >>> > > >
>> >> >>> > > > > Xuefu,
>> >> >>> > > > >
>> >> >>> > > > > I'm just saying that most of the shells
(e.g. mysql or
>> >> accumulo)
>> >> >>> > > reserve
>> >> >>> > > > > -u for user.
>> >> >>> > > > >
>> >> >>> > > > > I believe lots of stuff in Hive take MySQL
as an example.
>> >> >>> > > > >
>> >> >>> > > > > Alex
>> >> >>> > > > >
>> >> >>> > > > >
>> >> >>> > > > > On Mon, Apr 27, 2015 at 2:14 PM, Xuefu
Zhang <
>> >> xzhang@cloudera.com
>> >> >>> >
>> >> >>> > > > wrote:
>> >> >>> > > > >
>> >> >>> > > > >> Alex,
>> >> >>> > > > >>
>> >> >>> > > > >> Just to be sure, we are talking about
replace Hive CLI, not
>> >> mysql
>> >> >>> > and
>> >> >>> > > > >> accumulo command line shells. Thus,
I'm not sure this is
>> >> >>> relavent.
>> >> >>> > > > >> Regardless, I think we'd better have
some writeup in the
>> >> proposed
>> >> >>> > uber
>> >> >>> > > > >> JIRA
>> >> >>> > > > >> so that everyone knows what we are
signing up.
>> >> >>> > > > >>
>> >> >>> > > > >> Thanks,
>> >> >>> > > > >> Xuefu
>> >> >>> > > > >>
>> >> >>> > > > >> On Mon, Apr 27, 2015 at 12:57 PM, Alexander
Pivovarov <
>> >> >>> > > > >> apivovarov@gmail.com>
>> >> >>> > > > >> wrote:
>> >> >>> > > > >>
>> >> >>> > > > >> > Mysql and accumulo command line
shells use -u to pass
>> <user>
>> >> >>> > > > >> >
>> >> >>> > > > >> > Can beeline use -u as well? Currently
-u is reserved for
>> >> URL?
>> >> >>> > > > >> > On Apr 27, 2015 12:42 PM, "Xuefu
Zhang" <
>> >> xzhang@cloudera.com>
>> >> >>> > > wrote:
>> >> >>> > > > >> >
>> >> >>> > > > >> > > Thanks to all for the input.
I assume that we have a
>> >> >>> consensus
>> >> >>> > > that
>> >> >>> > > > >> we'd
>> >> >>> > > > >> > > like to keep Hive as an alias
to beeline with embedded
>> HS2
>> >> >>> and
>> >> >>> > > make
>> >> >>> > > > >> user
>> >> >>> > > > >> > > transition as smooth as possible
by identifying gaps
>> and
>> >> >>> fixing
>> >> >>> > > > >> issues.
>> >> >>> > > > >> > I'm
>> >> >>> > > > >> > > going to create an umbrella
JIRA and subtasks to track
>> the
>> >> >>> > > progress.
>> >> >>> > > > >> > Please
>> >> >>> > > > >> > > let me know if you have further
questions.
>> >> >>> > > > >> > >
>> >> >>> > > > >> > > Thanks,
>> >> >>> > > > >> > > Xuefu
>> >> >>> > > > >> > >
>> >> >>> > > > >> > > On Sat, Apr 25, 2015 at 12:59
AM, Lars Francke <
>> >> >>> > > > >> lars.francke@gmail.com>
>> >> >>> > > > >> > > wrote:
>> >> >>> > > > >> > >
>> >> >>> > > > >> > > > Yes, well put. It is
about usability and "least
>> >> surprise".
>> >> >>> > > > >> > > >
>> >> >>> > > > >> > > > So if people wouldn't
have to deal with JDBC syntax
>> by
>> >> >>> default
>> >> >>> > > and
>> >> >>> > > > >> > could
>> >> >>> > > > >> > > > use "hive" instead of
"beeline" to start that'd be
>> good.
>> >> >>> > > > >> > > >
>> >> >>> > > > >> > > >
>> >> >>> > > > >> > > > On Sat, Apr 25, 2015
at 12:38 AM, Alan Gates <
>> >> >>> > > > alanfgates@gmail.com>
>> >> >>> > > > >> > > wrote:
>> >> >>> > > > >> > > >
>> >> >>> > > > >> > > >> If I understand
correctly this is an argument about
>> >> >>> > usability,
>> >> >>> > > > not
>> >> >>> > > > >> > > >> functionality. 
So if Hive still had the CLI but it
>> >> >>> happened
>> >> >>> > to
>> >> >>> > > > use
>> >> >>> > > > >> > > either
>> >> >>> > > > >> > > >> HS2 or embedded
HS2 (depending on configuration)
>> >> >>> underneath
>> >> >>> > > your
>> >> >>> > > > >> > > concerns
>> >> >>> > > > >> > > >> would be addressed.
 Is that correct?
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >> Alan.
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >>  Lars Francke <lars.francke@gmail.com>
>> >> >>> > > > >> > > >>  April 23, 2015
at 15:53
>> >> >>> > > > >> > > >> I've been at about
20 different customers in the
>> years
>> >> >>> since
>> >> >>> > > > >> Beeline
>> >> >>> > > > >> > has
>> >> >>> > > > >> > > >> been added. I can
only think of a single one that
>> has
>> >> used
>> >> >>> > > > beeline.
>> >> >>> > > > >> > The
>> >> >>> > > > >> > > >> instinct is to use
"hive", partially because it is
>> >> easy to
>> >> >>> > > > remember
>> >> >>> > > > >> > and
>> >> >>> > > > >> > > >> intuitive and because
it is easier to use. I end up
>> >> >>> googling
>> >> >>> > > the
>> >> >>> > > > >> > stupid
>> >> >>> > > > >> > > >> JDBC syntax every
single time.
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >> I know this might
be a bit "out there" but I propose
>> >> >>> > something
>> >> >>> > > > >> else:
>> >> >>> > > > >> > > >> 1) Rename (or link)
"beeline" to "hive"
>> >> >>> > > > >> > > >> 2) Add a "--hiveserver2"
(or "--jdbc" or
>> "--beeline")
>> >> >>> option
>> >> >>> > to
>> >> >>> > > > the
>> >> >>> > > > >> > > >> "hive" command to
get the current "beeline", this'd
>> >> keep
>> >> >>> the
>> >> >>> > > CLI
>> >> >>> > > > as
>> >> >>> > > > >> > > >> default, we could
also add a "--legacy" or "--cli"
>> >> option
>> >> >>> and
>> >> >>> > > > make
>> >> >>> > > > >> > > >> "hiveserver2/beeline"
the default.
>> >> >>> > > > >> > > >> 3) Add a "--embedded-hs2"
option to the "hive"
>> command
>> >> to
>> >> >>> get
>> >> >>> > > an
>> >> >>> > > > >> > > embedded
>> >> >>> > > > >> > > >> HS2 in Beeline
>> >> >>> > > > >> > > >> 4) Add some documentation
to beeline reminding
>> people
>> >> on
>> >> >>> > > startup
>> >> >>> > > > of
>> >> >>> > > > >> > > >> beeline on how to
connect and how to use embedded
>> mode
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >> The fact is that
the old shell just works for lots
>> of
>> >> >>> people
>> >> >>> > > and
>> >> >>> > > > >> > there's
>> >> >>> > > > >> > > >> just no need for
beeline for these people. Also the
>> >> name
>> >> >>> is
>> >> >>> > > > >> confusing
>> >> >>> > > > >> > -
>> >> >>> > > > >> > > >> especially for non-native
speakers. It's not a
>> common
>> >> >>> word so
>> >> >>> > > > it's
>> >> >>> > > > >> not
>> >> >>> > > > >> > > easy
>> >> >>> > > > >> > > >> to remember.
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >>  Alan Gates <alanfgates@gmail.com>
>> >> >>> > > > >> > > >>  April 23, 2015
at 15:35
>> >> >>> > > > >> > > >>  Xuefu, thanks for
getting this discussion started.
>> >> >>> Limiting
>> >> >>> > > our
>> >> >>> > > > >> code
>> >> >>> > > > >> > > >> paths is definitely
a plus.  My inclination would be
>> >> to go
>> >> >>> > > > towards
>> >> >>> > > > >> > > option
>> >> >>> > > > >> > > >> 2.  A few questions:
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >> 1) Is there any
functionality in CLI that's not in
>> >> >>> beeline?
>> >> >>> > > > >> > > >> 2) If I understand
correctly option 2 would have an
>> >> >>> implicit
>> >> >>> > > HS2
>> >> >>> > > > in
>> >> >>> > > > >> > > >> process when a user
runs the CLI.  Would this be
>> >> >>> available in
>> >> >>> > > > >> option 1
>> >> >>> > > > >> > > as
>> >> >>> > > > >> > > >> well?
>> >> >>> > > > >> > > >> 3) Are there any
performance implications, since now
>> >> >>> commands
>> >> >>> > > > have
>> >> >>> > > > >> to
>> >> >>> > > > >> > > hop
>> >> >>> > > > >> > > >> through a thrift/jdbc
loop even in the embedded
>> mode?
>> >> >>> > > > >> > > >> 4) If we choose
option 2 how backward compatible
>> can we
>> >> >>> make
>> >> >>> > > it?
>> >> >>> > > > >> Will
>> >> >>> > > > >> > > >> users need to change
any scripts they have that use
>> the
>> >> >>> CLI?
>> >> >>> > > Do
>> >> >>> > > > we
>> >> >>> > > > >> > have
>> >> >>> > > > >> > > >> tests that will
make sure of this?
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >> Alan.
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >>  Xuefu Zhang <xzhang@cloudera.com>
>> >> >>> > > > >> > > >>  April 23, 2015
at 14:43
>> >> >>> > > > >> > > >> Hi all,
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >> I'd like to revive
the discussion about the fate of
>> >> Hive
>> >> >>> CLI,
>> >> >>> > > as
>> >> >>> > > > >> this
>> >> >>> > > > >> > > >> topic
>> >> >>> > > > >> > > >> has haunted us several
times including [1][2]. It
>> looks
>> >> >>> to me
>> >> >>> > > > that
>> >> >>> > > > >> > there
>> >> >>> > > > >> > > >> is
>> >> >>> > > > >> > > >> a consensus that
it's not wise for Hive community to
>> >> keep
>> >> >>> > both
>> >> >>> > > > Hive
>> >> >>> > > > >> > CLI
>> >> >>> > > > >> > > as
>> >> >>> > > > >> > > >> it is as well as
Beeline + HS2. However, I don't
>> >> believe
>> >> >>> that
>> >> >>> > > no
>> >> >>> > > > >> > action
>> >> >>> > > > >> > > is
>> >> >>> > > > >> > > >> the best action
for us. From discussion so far, I
>> see
>> >> the
>> >> >>> > > > following
>> >> >>> > > > >> > > >> proposals:
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >> 1. Deprecating Hive
CLI and advise that users use
>> >> Beeline.
>> >> >>> > > > >> > > >> 2. Make Hive CLI
as naming flavor to beeline with
>> >> embedded
>> >> >>> > > mode.
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >> Frankly, I don't
see much difference between the two
>> >> >>> > > approaches.
>> >> >>> > > > >> > Keeping
>> >> >>> > > > >> > > >> an
>> >> >>> > > > >> > > >> alias at script
or even code level isn't that much
>> >> work.
>> >> >>> > > However,
>> >> >>> > > > >> > > >> shouldn't
>> >> >>> > > > >> > > >> we pick a direction
and start moving to it? If
>> there is
>> >> >>> any
>> >> >>> > > gaps
>> >> >>> > > > >> > between
>> >> >>> > > > >> > > >> beeline embedded
and Hive CLI, we should identify
>> and
>> >> >>> fill in
>> >> >>> > > > >> those.
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >> I'd love to hear
the thoughts from the community and
>> >> hope
>> >> >>> > this
>> >> >>> > > > >> time we
>> >> >>> > > > >> > > >> will
>> >> >>> > > > >> > > >> have concrete action
items to work on.
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >> Thanks,
>> >> >>> > > > >> > > >> Xuefu
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >> [1]
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > >
>> >> >>> > > > >> >
>> >> >>> > > > >>
>> >> >>> > > >
>> >> >>> > >
>> >> >>> >
>> >> >>>
>> >>
>> http://mail-archives.apache.org/mod_mbox/hive-dev/201412.mbox/%3C5485E1BE.3060709%40hortonworks.com%3E
>> >> >>> > > > >> > > >> [2]
>> >> >>> > > > >>
>> >> https://www.mail-archive.com/dev@hive.apache.org/msg112378.html
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >>
>> >> >>> > > > >> > > >
>> >> >>> > > > >> > >
>> >> >>> > > > >> >
>> >> >>> > > > >>
>> >> >>> > > > >
>> >> >>> > > > >
>> >> >>> > > >
>> >> >>> > >
>> >> >>> >
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>
>> >> >>
>> >>
>>

Mime
View raw message