hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sujeet Pardeshi <Sujeet.Parde...@sas.com>
Subject RE: How to perform hive moveTask in parallel?
Date Mon, 22 May 2017 06:15:00 GMT
You should create the partitions in HDFS locally. Then move these partitions through a copy
command on your external location (is it a S3 bucket?). You will see a massive gain in performance.


Regards,

Sujeet Singh Pardeshi

Software Specialist

SAS Research and Development (India) Pvt. Ltd.
Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar  Pune, Maharashtra, 411 013
off: +91-20-49118448
[Description: untitled]
 "When the solution is simple, God is answering…"

From: Rishi Aggarwal [mailto:rishi@hike.in]
Sent: 21 May 2017 AM 11:44
To: user@hive.apache.org
Subject: How to perform hive moveTask in parallel?


EXTERNAL

I am running a insert overwrite query on an external table which is partitioned (192 partitions).

On doing explain I see there are mainly two stage.
1.      MR stage (8 mappers and 10 reducers)
2.      Move Stage

MR stage is completing in 15-20 mins.

Move stage is taking about 3hours.

On looking further I found, reducers are writing to a temporary location then in move stage
it's moved to target location. Move from temp to target is happening sequentially. And since
I have 192 partitions and 10 reducers. It's taking 3 hours to move all the files.

Is there a way to do move in parallel?

Hive Version: 1.2.1
Mime
View raw message