hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Sprague <sprag...@gmail.com>
Subject Re: Metastore performance on HDFS-backed table with 15000+ partitions
Date Sat, 22 Feb 2014 04:18:07 GMT
most interesting.  we had an issue recently with querying a table with 15K
columns and running out of heap storage but not 15K partitions.

15K partitions shouldn't be causing a problem in my humble estimation.
Maybe a million but not 15K. :)

So is there a traceback we can look at? or its not heap but real memory?

and this is the local hive client? or the hiveserver?

Thanks,
Stephen.



On Fri, Feb 21, 2014 at 7:05 PM, Norbert Burger <norbert.burger@gmail.com>wrote:

> Hi folks,
>
> We are running CDH 4.3.0 Hive (0.10.0+121) with a MySQL metastore.
>
> In Hive, we have an external table backed by HDFS which has a 3-level
> partitioning scheme that currently has 15000+ partitions.
>
> Within the last day or so, queries against this table have started
> failing.  A simple query which shouldn't take very long at all (select *
> from ... limit 10) fails after several minutes with a client OOME.  I get
> the same outcome on count(*) queries (which I thought wouldn't send any
> data back to the client).  Increasing heap on both client and server JVMs
> (via HADOOP_HEAPSIZE) doesn't have any impact.
>
> We were only able to work around the client OOMEs by reducing the number
> of partitions in the table.
>
> Looking at the MySQL querylog, my thought is that the Hive client is quite
> busy making requests for partitions that doesn't contribute to the query.
>  Has anyone else had similar experience against tables this size?
>
> Thanks,
> Norbert
>

Mime
View raw message