hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rishi Aggarwal <ri...@hike.in>
Subject How to perform hive moveTask in parallel?
Date Sun, 21 May 2017 06:13:50 GMT
I am running a insert overwrite query on an external table which is
partitioned (192 partitions).

On doing explain I see there are mainly two stage.

   1. MR stage (8 mappers and 10 reducers)
   2. Move Stage

MR stage is completing in 15-20 mins.

Move stage is taking about *3hours*.

On looking further I found, reducers are writing to a temporary location
then in move stage it's moved to target location. Move from temp to target
is happening sequentially. And since I have 192 partitions and 10 reducers.
It's taking 3 hours to move all the files.

Is there a way to do move in parallel?

Hive Version: 1.2.1

Mime
View raw message