hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Haviv <daniel.ha...@veracity-group.com>
Subject Re: Extremely Slow Data Loading with 40k+ Partitions
Date Thu, 16 Apr 2015 18:55:56 GMT
Is this a test environment?
If so, can you try and disable concurrency?


Daniel

> On 16 באפר׳ 2015, at 19:44, Tianqi Tong <ttong@brightedge.com> wrote:
> 
> Hi Daniel,
> Actually the mapreduce job was just fine, but the process stuck on the data loading after
that.
> The output stopped at:
> Loading data to table default.parquet_table_with_40k_partitions partition (yearmonth=null,
prefix=null)
>  
> When I look at the size of hdfs files of table, I can see the size is growing, but it's
kind of slow.
> For mapreduce job, I had 400+ mappers and 100+ reducers.
>  
> Thanks
> Tianqi
>  
> From: Daniel Haviv [mailto:daniel.haviv@veracity-group.com] 
> Sent: Wednesday, April 15, 2015 9:23 PM
> To: user@hive.apache.org
> Subject: Re: Extremely Slow Data Loading with 40k+ Partitions
>  
> How many reducers are you using?
> 
> Daniel
> 
> On 16 באפר׳ 2015, at 00:55, Tianqi Tong <ttong@brightedge.com> wrote:
> 
> Hi,
> I'm loading data to a Parquet table with dynamic partitons. I have 40k+ partitions, and
I have skipped the partition stats computation step.
> Somehow it's still exetremely slow loading data into partitions (800MB/h).
> Do you have any hints on the possible reason and solution?
>  
> Thank you
> Tianqi Tong
>  

Mime
View raw message