hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norbert Burger <norbert.bur...@gmail.com>
Subject Re: Metastore performance on HDFS-backed table with 15000+ partitions
Date Thu, 27 Feb 2014 12:57:05 GMT
Thanks everyone for the feedback.  Just to follow up in case someone else
runs into this: I can confirm that local client works around the OOMEs, but
it's still very slow.

It does seem like we were hitting some combination of HIVE-4051 and
HIVE-5158.  We'll try reducing partition count first, and then switch to
0.12.0 if that doesn't improve things significantly.

Fwiw - http://www.slideshare.net/oom65/optimize-hivequeriespptx also has
has some good rules-of-thumb.

Norbert


On Sat, Feb 22, 2014 at 1:27 PM, Stephen Sprague <spragues@gmail.com> wrote:

> yeah. That traceback pretty much spells it out - its metastore related and
> that's where the partitions are stored.
>
> I'm with the others on this. HiveServer2 is still a little jankey on
> memory management.  I bounce mine once a day at midnight just to play it
> safe (and because i can.)
>
> Again, for me, i use the hive local client for production jobs and remote
> client for adhoc stuff.
>
> you may wish to confirm the local hive client has no problem with your
> query.
>
> other than that you either increase your heap size on the HS2 process and
> hope for the best and/or file a bug report.
>
> bottom line hiveserver2 isn't production bullet proof just yet, IMHO.
> Others may disagree.
>
> Regards,
> Stephen.
>
>
>
> On Sat, Feb 22, 2014 at 9:50 AM, Norbert Burger <norbert.burger@gmail.com>wrote:
>
>> Thanks all for the quick feedback.
>>
>> I'm a bit surprised to learn 15k is considered too much, but we can work
>> around it.  I guess I'm also curious why the query planner needs to know
>> about all partitions even in the case of simple select/limit queries, where
>> the query might target only a single partition.
>>
>> Here's the client-side OOME with HADOOP_HEAPSIZE=2048:
>>
>>
>> https://gist.githubusercontent.com/nburger/3286d2052060e2efe161/raw/dc30231086803c1d33b9137b5844d2d0e20e350d/gistfile1.txt
>>
>> This was from a CDH4.3.0 client hitting HIveServer2.  Any idea what's
>> consuming the heap?
>>
>> Norbert
>>
>>
>> On Sat, Feb 22, 2014 at 10:32 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>>
>>> Dont make tbales with that many partitions. It is an anti pattern. I
>>> hwve tables with 2000 partitions a day and that is rewlly to many. Hive
>>> needs go load that informqtion into memory to plan the query.
>>>
>>>
>>> On Saturday, February 22, 2014, Terje Marthinussen <
>>> tmarthinussen@gmail.com> wrote:
>>> > Query optimizer in hive is awful on memory consumption. 15k partitions
>>> sounds a bit early for it to fail though..
>>> >
>>> > What is your heap size?
>>> >
>>> > Regards,
>>> > Terje
>>> >
>>> >> On 22 Feb 2014, at 12:05, Norbert Burger <norbert.burger@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi folks,
>>> >>
>>> >> We are running CDH 4.3.0 Hive (0.10.0+121) with a MySQL metastore.
>>> >>
>>> >> In Hive, we have an external table backed by HDFS which has a 3-level
>>> partitioning scheme that currently has 15000+ partitions.
>>> >>
>>> >> Within the last day or so, queries against this table have started
>>> failing.  A simple query which shouldn't take very long at all (select *
>>> from ... limit 10) fails after several minutes with a client OOME.  I get
>>> the same outcome on count(*) queries (which I thought wouldn't send any
>>> data back to the client).  Increasing heap on both client and server JVMs
>>> (via HADOOP_HEAPSIZE) doesn't have any impact.
>>> >>
>>> >> We were only able to work around the client OOMEs by reducing the
>>> number of partitions in the table.
>>> >>
>>> >> Looking at the MySQL querylog, my thought is that the Hive client is
>>> quite busy making requests for partitions that doesn't contribute to the
>>> query.  Has anyone else had similar experience against tables this size?
>>> >>
>>> >> Thanks,
>>> >> Norbert
>>> >
>>>
>>> --
>>> Sorry this was sent from mobile. Will do less grammar and spell check
>>> than usual.
>>>
>>
>>
>

Mime
View raw message