Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 34345 invoked from network); 26 Aug 2009 21:50:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Aug 2009 21:50:49 -0000 Received: (qmail 54385 invoked by uid 500); 26 Aug 2009 21:50:49 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 54363 invoked by uid 500); 26 Aug 2009 21:50:48 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 54354 invoked by uid 99); 26 Aug 2009 21:50:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Aug 2009 21:50:48 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of billgraham@gmail.com designates 209.85.222.177 as permitted sender) Received: from [209.85.222.177] (HELO mail-pz0-f177.google.com) (209.85.222.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Aug 2009 21:50:40 +0000 Received: by pzk7 with SMTP id 7so628927pzk.2 for ; Wed, 26 Aug 2009 14:50:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=s3Efwzo6vddHynCTcHlyFyZZErgb7yFqoIuDGwl3QTY=; b=byChnKfL9/LkJFETQjQDDVLGJCU8qNMiRZne37BwyjAqO1Cs5wjatEQejUeziE2Twz D0uc/LcLcFYG6/hYbGigVyaMltNi3jRAr+Uykp2qwTmLvDk5Wsj5wfUmggImyCeVe26K jridYcjfZGGEZU5bFcdDSc+UMTbpFGEz9d9i4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type; b=th8HE7p5762PDDSAZqxJuLJPIvkjZ+r5HljcQzeE255KXhkqgscz78q8n9Qu8id15v mnyZs+Jfm20zR4Jgw9StDmiooCsUo3s8tS9W2cK7McptKXhUrVXWsgwFnjIm4lP7ssse JYx5HcfzFk9ASfShchk88/tbw9CFQUhXGS/fo= MIME-Version: 1.0 Received: by 10.142.61.33 with SMTP id j33mr733343wfa.236.1251323420526; Wed, 26 Aug 2009 14:50:20 -0700 (PDT) Reply-To: billgraham@gmail.com In-Reply-To: References: <68B7689C98024D43B4C2709456F0B5200A1B275FEC@SC-MBXC1.TheFacebook.com> Date: Wed, 26 Aug 2009 14:50:20 -0700 Message-ID: <449b48760908261450t671bc4c7q43982a1b5aaad0f0@mail.gmail.com> Subject: Re: Adding jar files when running hive in hwi mode or hiveserver mode From: Bill Graham To: hive-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001636e1f93dfac9420472126f7d X-Virus-Checked: Checked by ClamAV on apache.org --001636e1f93dfac9420472126f7d Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit +1 for the HWI -> HiveServer approach. Building out rich APIs in the HiveServer (thrift currently, and possible REST at some point), would allow the HiveServer to focus on the functional API. The HWI (and others) could then focus on rich UI functionality. The two would have a clean decoupling, which would reduce complexity of the codebases and help abid by the KISS principle. On Wed, Aug 26, 2009 at 2:42 PM, Edward Capriolo wrote: > On Wed, Aug 26, 2009 at 3:25 PM, Raghu Murthy wrote: > > Even if we decided to have multiple HiveServers, wouldn't it be possible > for > > HWI to randomly pick a HiveServer to connect to per query/client? > > > > On 8/26/09 12:16 PM, "Ashish Thusoo" wrote: > > > >> +1 for ajaxing this baby. > >> > >> On the broader question of whether we should combine HWI and HiveServer > - I > >> think there are definite deployment and code reuse advantages in doing > so, > >> however keeping them separate also has the advantage that we can cluster > >> HiveServers independently from HWI. Since the HiveServer sits in the > data > >> path, the independent scaling may have advantages. I am not sure how > strong of > >> an argument that is to not put them together. Simplicity obviously > indicates > >> that we should have them together. > >> > >> Thoughts? > >> > >> Ashish > >> > >> -----Original Message----- > >> From: Edward Capriolo [mailto:edlinuxguru@gmail.com] > >> Sent: Wednesday, August 26, 2009 9:45 AM > >> To: hive-user@hadoop.apache.org > >> Subject: Re: Adding jar files when running hive in hwi mode or > hiveserver mode > >> > >> On Tue, Aug 25, 2009 at 8:13 PM, Vijay wrote: > >>> Yep, I got it and now it works perfectly! I like hwi btw! It > >>> definitely makes things easier for a wider audience to try out hive. > >>> Your new session result bucket idea is very nice as well. I will keep > >>> trying more things and see if anything else comes up but so far it > looks > >>> great! > >>> Thanks Edward! > >>> > >>> On Tue, Aug 25, 2009 at 7:25 AM, Edward Capriolo > >>> > >>> wrote: > >>>> > >>>> On Tue, Aug 25, 2009 at 10:18 AM, Edward > >>>> Capriolo > >>>> wrote: > >>>>> On Mon, Aug 24, 2009 at 10:13 PM, Vijay wrote: > >>>>>> Probably spoke too soon :) I added this comment to the JIRA ticket > >>>>>> above. > >>>>>> > >>>>>> Hi, I tried the latest patch on trunk and there seems to be a > problem. > >>>>>> > >>>>>> I was interested in using the "add jar " command to add jar files > >>>>>> to the path. However, by the time the command flows through the > >>>>>> SessionState to the AddResourceProcessor (in > >>>>>> > >>>>>> ./ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProc > >>>>>> essor.java), the command word "add" is not being stripped so the > >>>>>> resource processor is trying to find a ResourceType of "ADD." > >>>>>> > >>>>>> I'm not sure if this was an existing bug or was a result of the > >>>>>> current set of changes. > >>>>>> > >>>>>> [ Show > ] > >>>>>> Vijay added a comment - 24/Aug/09 07:12 PM Hi, I tried the latest > >>>>>> patch on trunk and there seems to be a problem. I was interested > >>>>>> in using the "add jar " command to add jar files to the path. > >>>>>> However, by the time the command flows through the SessionState to > >>>>>> the AddResourceProcessor (in > >>>>>> > >>>>>> ./ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProc > >>>>>> essor.java), the command word "add" is not being stripped so the > >>>>>> resource processor is trying to find a ResourceType of "ADD." I'm > >>>>>> not sure if this was an existing bug or was a result of the > >>>>>> current set of changes. > >>>>>> On Mon, Aug 24, 2009 at 5:30 PM, Vijay wrote: > >>>>>>> > >>>>>>> That's awesome and looks like exactly what I needed. Local file > >>>>>>> system requirement is perfectly ok for now. I will check it out > right > >>>>>>> away! > >>>>>>> Hopefully it will be checked in soon. > >>>>>>> > >>>>>>> Thanks Edward! > >>>>>>> > >>>>>>> On Mon, Aug 24, 2009 at 5:14 PM, Edward Capriolo > >>>>>>> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> On Mon, Aug 24, 2009 at 8:09 PM, Prasad > >>>>>>>> Chakka > >>>>>>>> wrote: > >>>>>>>>> Vijay, there is no solution for it yet. There may be a jira > >>>>>>>>> open but AFAIK, no one is working on it. You are welcome to > >>>>>>>>> contribute this feature. > >>>>>>>>> > >>>>>>>>> Prasad > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> ________________________________ > >>>>>>>>> From: Vijay > >>>>>>>>> Reply-To: > >>>>>>>>> Date: Mon, 24 Aug 2009 16:59:28 -0700 > >>>>>>>>> To: > >>>>>>>>> Subject: Re: Adding jar files when running hive in hwi mode or > >>>>>>>>> hiveserver mode > >>>>>>>>> > >>>>>>>>> Hi, is there any solution for this? How does everybody include > >>>>>>>>> custom jar files running hive in a non-cli mode? > >>>>>>>>> > >>>>>>>>> Thanks in advance, > >>>>>>>>> Vijay > >>>>>>>>> > >>>>>>>>> On Sat, Aug 22, 2009 at 6:19 PM, Vijay wrote: > >>>>>>>>> > >>>>>>>>> When I run hive in cli mode, I add the hive_contrib.jar file > >>>>>>>>> using this > >>>>>>>>> command: > >>>>>>>>> > >>>>>>>>> hive> add jar lib/hive_contrib.jar > >>>>>>>>> > >>>>>>>>> Is there a way to do this automatically when running hive in > >>>>>>>>> hwi or hiveserver modes? Or do I have to add the jar file > >>>>>>>>> explicitly to any of the startup scripts? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> Vijay, > >>>>>>>> > >>>>>>>> Currently HWI does not support this. The changes in > >>>>>>>> https://issues.apache.org/jira/browse/HIVE-716 will make this > >>>>>>>> possible (although I did not test but it should work as the cli > >>>>>>>> does). The file will have to be in the servers local file > >>>>>>>> system. We could probably include 'commons upload' to the web > >>>>>>>> interface if there was a need for it. > >>>>>>>> > >>>>>>>> HIVE-716 should be in trunk soon. It does apply cleanly if its > >>>>>>>> something you need today, Edward > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> I just committed a new version of the patch. You were correct, the > >>>>> clidriver trims the first token off set and add queries hwi was not > >>>>> doing that. Also let me know your impressions of HWI. > >>>>> > >>>>> The new features are the 'ResultBucket' a buffer of the last x > >>>>> results viewable from the web interface, and the ability to supply > >>>>> more then one query at a time. > >>>>> > >>>>> These two features should add much usability now as you can do > >>>>> things like explain, show tables, etc and not have to dump the > >>>>> results to a file. > >>>>> > >>>>> Edward > >>>>> > >>>> > >>>> False statement: > >>>>>> I just committed a new version of the patch > >>>> > >>>> In actuality, I updated the Jira with a new patch. > >>>> > >>>> It is still early AM. all the gears are not turning yet. > >>>> > >>>> Edward > >>> > >>> > >> > >> Vijay, > >> > >>>> It definitely makes things easier for a wider audience to try out > >>>> hive > >> > >> That was always the goal. I often wonder which direction we should take > HWI > >> in. > >> Should HWI have some REST-ful stubs to turn it into a remote job > submission > >> system? > >> HiveServer uses thrift and I believe thrift has an HTTP-Transport so you > might > >> not need HWI to provide this. > >> > >> Should we ajax things like the result bucket or the entire interface so > it has > >> that ooo aaahhh effect? > >> > >> Really the larger question HWI has it's own multi-session management, > >> HiveServer has this as well (now way back when it did not) . Should HWI > just > >> front end HiveServer? > >> > >> Does anyone have any thoughts? > >> Edward > > > > > > I think Raghu is correct. HiveClient->HiveServer happens on a > permanent TCP connection (I think?). If you had a back end cluster of > HiveServers, and you had a load balancer or proxy with > sticky-session/session-tracking/source-ip policy. HWI would be > configured with the virtual IP address of the load balancer and would > connect and stay connected to a random HiveServer in the farm. > > I am naturally partial to the way it is now because I came up with it :) > > I like the idea of having a REST-ful/XML-RPC or some web service style > interface for job submit. > > My thinking behind HWI has always been KISS. Keep It Simple Stupid. > Anyone should be able to hack a few web pages onto it. Adding thrift, > ajax, XML-RPC layers definitely ups the complexity. > > It think it makes sense to do HWI->HiveServer. I will have to take a > deeper look at what HiveServer and thrift offers to be sure. > > Edward > --001636e1f93dfac9420472126f7d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable +1 for the HWI -> HiveServer approach.

Building out rich APIs in = the HiveServer (thrift currently, and possible REST at some point), would a= llow the HiveServer to focus on the functional API. The HWI (and others) co= uld then focus on rich UI functionality. The two would have a clean decoupl= ing, which would reduce complexity of the codebases and help abid by the KI= SS principle.



On Wed, Aug 26, 2009 at 2:42 PM, Edw= ard Capriolo <edlinuxguru@gmail.com> wrote:
On Wed, Aug 26, 2009 at 3:25 PM, Raghu Mu= rthy<rmurthy@facebook.com>= ; wrote:
> Even if we decided to have multiple HiveServers, wouldn't it be po= ssible for
> HWI to randomly pick a HiveServer to connect to per query/client?
>
> On 8/26/09 12:16 PM, "Ashish Thusoo" <athusoo@facebook.com> wrote:
>
>> +1 for ajaxing this baby.
>>
>> On the broader question of whether we should combine HWI and HiveS= erver - I
>> think there are definite deployment and code reuse advantages in d= oing so,
>> however keeping them separate also has the advantage that we can c= luster
>> HiveServers independently from HWI. Since the HiveServer sits in t= he data
>> path, the independent scaling may have advantages. I am not sure h= ow strong of
>> an argument that is to not put them together. Simplicity obviously= indicates
>> that we should have them together.
>>
>> Thoughts?
>>
>> Ashish
>>
>> -----Original Message-----
>> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
>> Sent: Wednesday, August 26, 2009 9:45 AM
>> To: hive-user@hadoo= p.apache.org
>> Subject: Re: Adding jar files when running hive in hwi mode or hiv= eserver mode
>>
>> On Tue, Aug 25, 2009 at 8:13 PM, Vijay<techvd@gmail.com> wrote:
>>> Yep, I got it and now it works perfectly! I like hwi btw! It >>> definitely makes things easier for a wider audience to try out= hive.
>>> Your new session result bucket idea is very nice as well. I wi= ll keep
>>> trying more things and see if anything else comes up but so fa= r it looks
>>> great!
>>> Thanks Edward!
>>>
>>> On Tue, Aug 25, 2009 at 7:25 AM, Edward Capriolo
>>> <edlinuxguru@gmail= .com>
>>> wrote:
>>>>
>>>> On Tue, Aug 25, 2009 at 10:18 AM, Edward
>>>> Capriolo<edlin= uxguru@gmail.com>
>>>> wrote:
>>>>> On Mon, Aug 24, 2009 at 10:13 PM, Vijay<techvd@gmail.com> wrote:
>>>>>> Probably spoke too soon :) I added this comment to= the JIRA ticket
>>>>>> above.
>>>>>>
>>>>>> Hi, I tried the latest patch on trunk and there se= ems to be a problem.
>>>>>>
>>>>>> I was interested in using the "add jar "= command to add jar files
>>>>>> to the path. However, by the time the command flow= s through the
>>>>>> SessionState to the AddResourceProcessor (in
>>>>>>
>>>>>> ./ql/src/java/org/apache/hadoop/hive/ql/processors= /AddResourceProc
>>>>>> essor.java), the command word "add" is n= ot being stripped so the
>>>>>> resource processor is trying to find a ResourceTyp= e of "ADD."
>>>>>>
>>>>>> I'm not sure if this was an existing bug or wa= s a result of the
>>>>>> current set of changes.
>>>>>>
>>>>>> [ Show > ]
>>>>>> Vijay added a comment - 24/Aug/09 07:12 PM Hi, I t= ried the latest
>>>>>> patch on trunk and there seems to be a problem. I = was interested
>>>>>> in using the "add jar " command to add j= ar files to the path.
>>>>>> However, by the time the command flows through the= SessionState to
>>>>>> the AddResourceProcessor (in
>>>>>>
>>>>>> ./ql/src/java/org/apache/hadoop/hive/ql/processors= /AddResourceProc
>>>>>> essor.java), the command word "add" is n= ot being stripped so the
>>>>>> resource processor is trying to find a ResourceTyp= e of "ADD." I'm
>>>>>> not sure if this was an existing bug or was a resu= lt of the
>>>>>> current set of changes.
>>>>>> On Mon, Aug 24, 2009 at 5:30 PM, Vijay <techvd@gmail.com> wrote:
>>>>>>>
>>>>>>> That's awesome and looks like exactly what= I needed. Local file
>>>>>>> system requirement is perfectly ok for now. I = will check it out right
>>>>>>> away!
>>>>>>> Hopefully it will be checked in soon.
>>>>>>>
>>>>>>> Thanks Edward!
>>>>>>>
>>>>>>> On Mon, Aug 24, 2009 at 5:14 PM, Edward Caprio= lo
>>>>>>> <e= dlinuxguru@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Mon, Aug 24, 2009 at 8:09 PM, Prasad >>>>>>>> Chakka<pchakka@facebook.com>
>>>>>>>> wrote:
>>>>>>>>> Vijay, there is no solution for it yet= . There may be a jira
>>>>>>>>> open but AFAIK, no one is working on i= t. You are welcome to
>>>>>>>>> contribute this feature.
>>>>>>>>>
>>>>>>>>> Prasad
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________
>>>>>>>>> From: Vijay <techvd@gmail.com>
>>>>>>>>> Reply-To: <hive-user@hadoop.apache.org>
>>>>>>>>> Date: Mon, 24 Aug 2009 16:59:28 -0700<= br> >>>>>>>>> To: <hive-user@hadoop.apache.org>
>>>>>>>>> Subject: Re: Adding jar files when run= ning hive in hwi mode or
>>>>>>>>> hiveserver mode
>>>>>>>>>
>>>>>>>>> Hi, is there any solution for this? Ho= w does everybody include
>>>>>>>>> custom jar files running hive in a non= -cli mode?
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>> Vijay
>>>>>>>>>
>>>>>>>>> On Sat, Aug 22, 2009 at 6:19 PM, Vijay= <techvd@gmail.com> wrote: >>>>>>>>>
>>>>>>>>> When I run hive in cli mode, I add the= hive_contrib.jar file
>>>>>>>>> using this
>>>>>>>>> command:
>>>>>>>>>
>>>>>>>>> hive> add jar lib/hive_contrib.jar<= br> >>>>>>>>>
>>>>>>>>> Is there a way to do this automaticall= y when running hive in
>>>>>>>>> hwi or hiveserver modes? Or do I have = to add the jar file
>>>>>>>>> explicitly to any of the startup scrip= ts?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Vijay,
>>>>>>>>
>>>>>>>> Currently HWI does not support this. The c= hanges in
>>>>>>>> https://issues.apache.org/jira/browse/HI= VE-716 will make this
>>>>>>>> possible (although I did not test but it s= hould work as the cli
>>>>>>>> does). The file will have to be in the ser= vers local file
>>>>>>>> system. We could probably include 'com= mons upload' to the web
>>>>>>>> interface if there was a need for it.
>>>>>>>>
>>>>>>>> HIVE-716 should be in trunk soon. It does = apply cleanly if its
>>>>>>>> something you need today, Edward
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> I just committed a new version of the patch. You were = correct, the
>>>>> clidriver trims the first token off set and add querie= s hwi was not
>>>>> doing that. Also let me know your impressions of HWI.<= br> >>>>>
>>>>> The new features are the 'ResultBucket' a buff= er of the last x
>>>>> results viewable from the web interface, and the abili= ty to supply
>>>>> more then one query at a time.
>>>>>
>>>>> These two features should add much usability now as yo= u can do
>>>>> things like explain, show tables, etc and not have to = dump the
>>>>> results to a file.
>>>>>
>>>>> Edward
>>>>>
>>>>
>>>> False statement:
>>>>>> I just committed a new version of the patch
>>>>
>>>> In actuality, I updated the Jira with a new patch.
>>>>
>>>> It is still early AM. all the gears are not turning yet. >>>>
>>>> Edward
>>>
>>>
>>
>> Vijay,
>>
>>>> It definitely makes things easier for a wider audience to = try out
>>>> hive
>>
>> That was always the goal. I often wonder which direction we should= take HWI
>> in.
>> Should HWI have some REST-ful stubs to turn it into a remote job s= ubmission
>> system?
>> HiveServer uses thrift and I believe thrift has an HTTP-Transport = so you might
>> not need HWI to provide this.
>>
>> Should we ajax things like the result bucket or the entire interfa= ce so it has
>> that ooo aaahhh effect?
>>
>> Really the larger question HWI has it's own multi-session mana= gement,
>> HiveServer has this as well (now way back when it did not) . Shoul= d HWI just
>> front end HiveServer?
>>
>> Does anyone have any thoughts?
>> Edward
>
>

I think Raghu is correct. HiveClient->HiveServer happens on = a
permanent TCP connection (I think?). If you had a back end cluster of
HiveServers, =A0and you had a load balancer or proxy with
sticky-session/session-tracking/source-ip policy. HWI would be
configured with the virtual IP address of the load balancer and would
connect and stay connected to a random HiveServer in the farm.

I am naturally partial to the way it is now because I came up with it :)
I like the idea of having a REST-ful/XML-RPC or some web service style
interface for job submit.

My thinking behind HWI has always been KISS. Keep It Simple Stupid.
Anyone should be able to hack a few web pages onto it. Adding thrift,
ajax, XML-RPC layers definitely ups the complexity.

It think it makes sense to do HWI->HiveServer. I will have to take a
deeper look at what HiveServer and thrift offers to be sure.

Edward

--001636e1f93dfac9420472126f7d--