hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Udit Mehta <ume...@groupon.com>
Subject Re: Hive Metastore Bottleneck
Date Wed, 30 Mar 2016 22:33:23 GMT
But dont the clients always pick the first URI for multiple instances
mentioned in "*hive.metastore.uris" *config and fallback to the others only
if the first is unreachable? This way, we would still have a bottleneck,
right?
Can you give a little more information on your setup and how you enable
load balancing?
I think  i am missing something here.

Thanks,
Udit

On Wed, Mar 30, 2016 at 3:20 PM, Gautam <gautamkowshik@gmail.com> wrote:

> The metastore service is a java process that is a thrift server .. so you
> can point multiple such hive metastore instances with
> "javax.jdo.option.ConnectionURL" poitning to the same mysql db.
>
> On Wed, Mar 30, 2016 at 3:11 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>>
>>
>> Can you clarify this please
>>
>> "Have you tried putting multiple metastores behind a load balancer"
>>
>> Are you implying that metastore and backend DB are different entities
>> here.
>>
>> As far as I know $HIVE_HOME/bin/hive --service metastore & starts Hive
>> threads to the backend database/metastore and Hive server2 acts a gateway
>> for remote access to Hive metastore through beeline or other clients
>>
>> There is only one metastore here namely MySQL/Oracle or others.
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 30 March 2016 at 22:53, Gautam <gautamkowshik@gmail.com> wrote:
>>
>>> Can you elaborate on where you see the bottleneck?   A general overview
>>> of your access path would be useful. For instance if you'r accessing Hive
>>> metastore via HiveServer2 or from webhcat using embedded cli or something
>>> else.
>>>
>>> Have you tried putting multiple metastores behind a load balancer? It's
>>> just a thrift service over mysql so can have multiple instances pointing to
>>> same backend db.
>>>
>>> On Wed, Mar 30, 2016 at 2:28 PM, Udit Mehta <umehta@groupon.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We are currently running Hive in production and staging with the
>>>> metastore connecting to a MySql database in the backend. The traffic in
>>>> production accessing the metastore is more than staging which is expected.
>>>> We have had a sudden increase in traffic which has led to the metastore
>>>> operation taking a lot longer than before. The same query on staging takes
>>>> a lot less due to the lesser traffic on the staging cluster.
>>>>
>>>> We tried increasing the heap space for the metastore process as well as
>>>> bumped up the memory for the mysql database. Both these changes did not
>>>> seem to help much and we still see delays. Is there any other config we can
>>>> increase to counter this increased traffic? I am looking at config for max
>>>> threads as well but im not sure if this is the right path ahead.
>>>>
>>>> Im wondering if the metastore is a bottleneck here or im missing
>>>> something.
>>>>
>>>> Looking forward to your reply,
>>>> Udit
>>>>
>>>
>>>
>>>
>>> --
>>> "If you really want something in this life, you have to work for it.
>>> Now, quiet! They're about to announce the lottery numbers..."
>>>
>>
>>
>
>
> --
> "If you really want something in this life, you have to work for it. Now,
> quiet! They're about to announce the lottery numbers..."
>

Mime
View raw message