hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Xu <...@gopivotal.com>
Subject Re: Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce
Date Thu, 06 Jun 2013 08:36:03 GMT
Hi Shaun,

Too many partitions in dynamic partitioning may slow down the mapreduce
job. Can you estimate how many partitions will be generated after insert?


On Thu, Jun 6, 2013 at 4:24 PM, Shaun Clowes <sclowes@atlassian.com> wrote:

> Hi All,
>
> Does anyone know the performance impact the dynamic partitions should be
> expected to have?
>
> I have a table that is partitioned by a string in the form 'YYYY-MM'. When
> I insert in to this table (from an external table that is just an S3 bucket
> containing gzipped logs) using dynamic partitioning I get very slow
> performance with each node in the cluster unable to process more than 2MB
> per second. When I run the exact same query with static partition values I
> get more about 30-40MB/s on each node.
>
> I've never seen this type of problem with our internal cluster running
> Hive 0.7.1 (CDH3u4), but it happens every time in EMR.
>
> Thanks,
> Shaun
>



-- 
Regards,
Ted Xu

Mime
View raw message