hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pradeep Gollakota <>
Subject Re: Very slow dynamic partition load
Date Thu, 11 Jun 2015 22:01:52 GMT
I actually decided to remove one of my 2 partition columns and make it a
bucketing column instead... same query completed fully in under 10 minutes
with 92 partitions added. This will suffice for me for now.

On Thu, Jun 11, 2015 at 2:25 PM, Pradeep Gollakota <>

> Hmm... did your performance increase with the patch you supplied? I do
> need the partitions in Hive, but I have a separate tool that has the
> ability to add partitions to the metastore and is definitely much faster
> than this. I just checked my job again, the actual Hive job completed 24
> hours ago and has been adding the dynamic partitions to the metastore since
> then and is still not done. According to the metastore theres only 10830
> partitions added so far... at this pace, it will take approximately 2 more
> days for it complete.
> On Thu, Jun 11, 2015 at 1:18 PM, Slava Markeyev <
>> wrote:
>> This is something that a few of us have run into. I think the bottleneck
>> is in partition creation calls to the metastore. My work around was
>> HIVE-10385 which optionally removed partition creation in the metastore but
>> this isn't a solution for everyone. If you don't require actual partitions
>> in the table but simply partitioned data in hdfs give it a shot. It may be
>> worthwhile looking into optimizations for this use case.
>> -Slava
>> On Thu, Jun 11, 2015 at 11:56 AM, Pradeep Gollakota <
>> > wrote:
>>> Hi All,
>>> I have a table which is partitioned on two columns (customer, date). I'm
>>> loading some data into the table using a Hive query. The MapReduce job
>>> completed within a few minutes and needs to "commit" the data to the
>>> appropriate partitions. There were about 32000 partitions generated. The
>>> commit phase has been running for almost 16 hours and has not finished yet.
>>> I've been monitoring jmap, and don't believe it's a memory or gc issue.
>>> I've also been looking at jstack and not sure why it's so slow. I'm not
>>> sure what the problem is, but seems to be a Hive performance issue when it
>>> comes to "highly partitioned" tables.
>>> Any thoughts on this issue would be greatly appreciated.
>>> Thanks in advance,
>>> Pradeep
>> --
>> Slava Markeyev | Engineering | Upsight
>> Find me on LinkedIn <>
>> <>

View raw message