hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tianqi Tong <>
Subject RE: Extremely Slow Data Loading with 40k+ Partitions
Date Thu, 16 Apr 2015 16:44:07 GMT
Hi Daniel,
Actually the mapreduce job was just fine, but the process stuck on the data loading after
The output stopped at:
Loading data to table default.parquet_table_with_40k_partitions partition (yearmonth=null,

When I look at the size of hdfs files of table, I can see the size is growing, but it's kind
of slow.
For mapreduce job, I had 400+ mappers and 100+ reducers.


From: Daniel Haviv []
Sent: Wednesday, April 15, 2015 9:23 PM
Subject: Re: Extremely Slow Data Loading with 40k+ Partitions

How many reducers are you using?

On 16 באפר׳ 2015, at 00:55, Tianqi Tong <<>>
I'm loading data to a Parquet table with dynamic partitons. I have 40k+ partitions, and I
have skipped the partition stats computation step.
Somehow it's still exetremely slow loading data into partitions (800MB/h).
Do you have any hints on the possible reason and solution?

Thank you
Tianqi Tong

View raw message