hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish Malhotra <manish.malhotra.w...@gmail.com>
Subject Re: Reg: <Real Time Hive query>
Date Wed, 26 Dec 2012 03:01:04 GMT
Hi,

As mentioned by Nitin and other fellows.
There are few points you need to consider.

1. Hive is currently and build for OLAP apps and not for OLTP ( Realtime
like RDBMS like MySQL, Oracle)

2. Though you can connect to Hive Thrift using JDBC implementation, but its
still not a production grade API as its is not scale able  for concurrent
clients.
reference:  https://cwiki.apache.org/Hive/hiveserver.html

As current JDBC driver execute HQL Queries through hive server.

3. Realtime query system also requires sophisticated locking protocols,
where HIVE implements very basic locking protocols as what is required.

4. Hive Metastore is also not that scale able right now, as it can get into
OOM exception once the partitions are more.
ref: https://issues.apache.org/jira/browse/HIVE-2907

5. Hive Metastore Client doesn't has retry logic , so when CLI or Hive
Server is connected to Hive Metastore and connection drops or having some
network issue, it cannot reconnect automatically.

ref:
http://mail-archives.apache.org/mod_mbox/hive-user/201211.mbox/%3CCA+FBdFT20nnQ5pOMcJ0ctE8RRseVFxxJO4qjAgxD1doBc+Wr6Q@mail.gmail.com%3E
https://issues.apache.org/jira/browse/HIVE-3400 ( looks like this is fixed,
but don't which version will be having this fix as well)

5. Need to architect the realtime query system around Hive by using other
technologies like MySQL (RDBMS), Cache etc. ( by pushing aggregated data to
the RDBMS or Cache layer) and then allow users to write query on top of
atleast 1 level of aggregated / grouped data, to reduce the data to be
queried.

There are other points as well that is obvious and well known different in
Hive and RDBMS.

So, please see the points above and take your decision and design the
system.
Hope fully this will help.

Regards,
Manish



On Tue, Dec 25, 2012 at 2:06 AM, Nitin Pawar <nitinpawar432@gmail.com>wrote:

> Hive is not like mysql where u just query and get the results. It will
> take time based on data size and query. You may look at oozie if you want
> to build an application or look at penatho with hive integration
>
> Hive cli is not only for testing. You can build application using hive cli
> and scripting languages
>
> You can use hive thrift server and use it like jdbc but keep in mind this
> is never realtime
> On Dec 25, 2012 3:24 PM, "Kshiva Kps" <kshivakps@gmail.com> wrote:
>
>> Many Thanks, for your replay.
>>
>> But in real time if you want to develop application (jobs) in this case
>> CLI won't help us, CLI is for testing pls current if i'm worng, thanks.
>>
>>  Many Thanks
>> Kshiva@ +91 9940163885
>>
>> On Tue, Dec 25, 2012 at 2:09 PM, Nitin Pawar <nitinpawar432@gmail.com>wrote:
>>
>>> hive comes with a thrift server so you can connect via jdbc.
>>>
>>> you just want to execute queries, why dont u use hive cli  ?
>>>
>>>
>>> On Tue, Dec 25, 2012 at 1:01 PM, Kshiva Kps <kshivakps@gmail.com> wrote:
>>>
>>>> Thnaks... sorry to ask you if possible could you  pls advice on below
>>>> points
>>>>
>>>>>
>>>>> In general in Real time how we will write scripts
>>>>> 1. Java + hive query  --could you pls ,if possibl share one program
>>>>> which can be executed thro Eclipse IDE many thanks.
>>>>>
>>>>> Thnks
>>>>> Siva @09940163885
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>

Mime
View raw message