hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shaun Clowes <sclo...@atlassian.com>
Subject Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce
Date Thu, 06 Jun 2013 08:24:20 GMT
Hi All,

Does anyone know the performance impact the dynamic partitions should be
expected to have?

I have a table that is partitioned by a string in the form 'YYYY-MM'. When
I insert in to this table (from an external table that is just an S3 bucket
containing gzipped logs) using dynamic partitioning I get very slow
performance with each node in the cluster unable to process more than 2MB
per second. When I run the exact same query with static partition values I
get more about 30-40MB/s on each node.

I've never seen this type of problem with our internal cluster running Hive
0.7.1 (CDH3u4), but it happens every time in EMR.

Thanks,
Shaun

Mime
View raw message