hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Lerman <dler...@videoegg.com>
Subject Re: Hive and thrift session help
Date Tue, 08 Sep 2009 22:58:33 GMT
I believe what you're looking for is being worked on and tracked here:

http://issues.apache.org/jira/browse/HIVE-80


On 9/8/09 6:37 PM, "Vijay" <techvd@gmail.com> wrote:

> I get that HWI does manage sessions but it does that leveraging the internal
> functionality of the "server." One usage pattern I'd like is some kind of a
> "job" API. What I mean by that is an API that lets us simply submit a query,
> get some kind of "job id," and leave. After that we use other APIs to query
> the job status, kill it, get the output once it is done, etc. If we have a
> simple API like this and the semantics to support this within hive, then the
> UI can be completely decoupled and be as stateless as it can (using vanilla
> apache+php as an example, we can't really do threads or stay resident after
> submitting a job). Does something like this exist either within hive or at the
> hadoop level? It seems to me may be this is something that needs to be built
> first.
> 
> Thanks,
> Vijay
> 
> On Tue, Sep 8, 2009 at 2:52 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
>> On Tue, Sep 8, 2009 at 5:15 PM, Royce
>> Rollins<rrollins@attinteractive.com> wrote:
>>>> OK I see. I just looked at the code in HWISessionManager.java.  So it looks
>>>> like either I will have to write my own ruby HWISessionManager that manages
>>>> sessions through thrift or expose the existng HWISessionManager via some
>>
>>>> web
>>>> service interface.  Has anyone done this?
>>>> 
>>>> Royce
>>>> 
>>>> 
>>>> On 9/8/09 1:47 PM, "Edward Capriolo" <edlinuxguru@gmail.com> wrote:
>>>> 
>>>>>> On Tue, Sep 8, 2009 at 4:38 PM, Vijay<techvd@gmail.com> wrote:
>>>>>>>> Sorry to inject into this thread but I have the same problem
(only I'm
>>>>>>>> trying to use the thrift PHP libraries from apache-php scripts).
The
>>>>> problem
>>>>>>>> with this approach is that the http request cannot run indefinitely
as
>>>>>>>> the
>>>>>>>> server is executing a query. Are there any solutions for
this?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Vijay
>>>>>>>> 
>>>>>>>> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins
>>>>> <rrollins@attinteractive.com>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Raghu,
>>>>>>>>>> Thanks for the quick response.
>>>>>>>>>> Yes.  My application is web based so instead of
having to build some
>>>>>>>>>> kind
>>>>>>>>>> of
>>>>>>>>>> session model myself for queries that might take
a while,  I'd like
>>>>>>>>>> >>>>> to use
>>>>>>>>>> a session model in the hive service.
>>>>>>>>>> 
>>>>>>>>>> Royce
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 9/8/09 1:32 PM, "Raghu Murthy" <rmurthy@facebook.com>
wrote:
>>>>>>>>>> 
>>>>>>>>>>>> Our model so far has been to create a new
connection to the hive
>>>>>>>>>>>> >>>>>> thrift
>>>>>>>>>>>> server per session. Is there anything specific
you are looking for
>>>>>>>>>>>> in
>>>>>>>>>>>> sessions?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 9/8/09 1:06 PM, "Royce Rollins" <rrollins@attinteractive.com>
>>>>>>>>>>>> >>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I¹m curently working on an application that
connects to hive via
>>>>>>>>>>>> the
>>>>>>>>>>>> thrift
>>>>>>>>>>>> ruby libraries.
>>>>>>>>>>>> 
>>>>>>>>>>>> Does hive support creation of sessions using
those libraries.  If
>>>>>>>>>>>> so,
>>>>>>>>>>>> how?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Royce
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> Royce,
>>>>>> 
>>>>>> The Hive Web Interface deals with this by having a threaded object
>>>>>> (HWISessionManager) in the Web application scope. I am not sure if
PHP
>>>>>> has any equivalent to threading and Application Scope.
>>>>>> 
>>>>>> Edward
>>>> 
>>>> 
>> 
>> Someone correct me if I am wrong.
>> 
>> Royce,
>> 
>> You may be able to get at this another way. From my understanding, the
>> internal hive web interface used at facebook would spawn ` bin/hive -e
>> 'INSERT INTO X select * FROM`. All results were written to a hive
>> table.
>> 
>> Doing it this way gives you no way to interact with the query and
>> 'stream' the result, set you can't really use 'fetchOne()' or
>> 'fetchAll()' but you could start a query and set flags on completion.
>> 
>> As for web interface, we just had some talks, and one of the things I
>> was looking to do was create some type of web service style bindings.
>> (We would also like to have HWI talk to Thrift and have thrift be the
>> code path for everything). However, if we do make some web server
>> style bindings they would really be independent of the back end. Do
>> you want to work on this ? I would like to open a Jira and tackle the
>> issue.
>> 
>> 
>> The big picture here is that we need a 'state holder'. That is really
>> what HWI is. You create a session, detach from it, and optionally
>> check on it later. If an application needs that pattern how to handle
>> it?
>> 
>> One way to tackle this is
>> 
>> INSERT INTO file 'hdfs://path/to/file' select * FROM XXX' &
>> 
>> then have your client 'tail' the hdfs://path/to/file or record the
>> last position it saw. I guess the big question is dealing with
>> streaming results. HWI manages the session for you and writes the
>> results to a local file, (and the new SessionBucket
>> 
>> What is the usage pattern you need?
> 


Mime
View raw message