hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terje Marthinussen <tmarthinus...@gmail.com>
Subject Re: Metastore performance on HDFS-backed table with 15000+ partitions
Date Sat, 22 Feb 2014 11:48:23 GMT
Query optimizer in hive is awful on memory consumption. 15k partitions sounds a bit early for
it to fail though.. 

What is your heap size?

Regards,
Terje

> On 22 Feb 2014, at 12:05, Norbert Burger <norbert.burger@gmail.com> wrote:
> 
> Hi folks,
> 
> We are running CDH 4.3.0 Hive (0.10.0+121) with a MySQL metastore.
> 
> In Hive, we have an external table backed by HDFS which has a 3-level partitioning scheme
that currently has 15000+ partitions.
> 
> Within the last day or so, queries against this table have started failing.  A simple
query which shouldn't take very long at all (select * from ... limit 10) fails after several
minutes with a client OOME.  I get the same outcome on count(*) queries (which I thought wouldn't
send any data back to the client).  Increasing heap on both client and server JVMs (via HADOOP_HEAPSIZE)
doesn't have any impact.
> 
> We were only able to work around the client OOMEs by reducing the number of partitions
in the table.
> 
> Looking at the MySQL querylog, my thought is that the Hive client is quite busy making
requests for partitions that doesn't contribute to the query.  Has anyone else had similar
experience against tables this size?
> 
> Thanks,
> Norbert

Mime
View raw message