hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tianqi Tong <tt...@brightedge.com>
Subject RE: Extremely Slow Data Loading with 40k+ Partitions
Date Thu, 16 Apr 2015 16:44:07 GMT
Hi Daniel,
Actually the mapreduce job was just fine, but the process stuck on the data loading after
that.
The output stopped at:
Loading data to table default.parquet_table_with_40k_partitions partition (yearmonth=null,
prefix=null)

When I look at the size of hdfs files of table, I can see the size is growing, but it's kind
of slow.
For mapreduce job, I had 400+ mappers and 100+ reducers.

Thanks
Tianqi

From: Daniel Haviv [mailto:daniel.haviv@veracity-group.com]
Sent: Wednesday, April 15, 2015 9:23 PM
To: user@hive.apache.org
Subject: Re: Extremely Slow Data Loading with 40k+ Partitions

How many reducers are you using?
Daniel

On 16 באפר׳ 2015, at 00:55, Tianqi Tong <ttong@brightedge.com<mailto:ttong@brightedge.com>>
wrote:
Hi,
I'm loading data to a Parquet table with dynamic partitons. I have 40k+ partitions, and I
have skipped the partition stats computation step.
Somehow it's still exetremely slow loading data into partitions (800MB/h).
Do you have any hints on the possible reason and solution?

Thank you
Tianqi Tong

Mime
View raw message