hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rishi Aggarwal <ri...@hike.in>
Subject Re: How to perform hive moveTask in parallel?
Date Sun, 21 May 2017 06:36:04 GMT
Thanks Prasanth Jayachandran!

Will try with hive 2.1.0.

We were using Qubole with hive version 1.2.0, there file move is happening
in parallel. Any idea why it's working in qubole?



On Sun, May 21, 2017 at 11:51 AM, Prasanth Jayachandran <
pjayachandran@hortonworks.com> wrote:

> You are looking for https://issues.apache.org/jira/browse/HIVE-12988
>
> This got added from hive 2.1.0 release onwards.
>
> Thanks
> Prasanth
>
> On May 21, 2017, at 1:13 AM, Rishi Aggarwal <rishi@hike.in> wrote:
>
> I am running a insert overwrite query on an external table which is
> partitioned (192 partitions).
>
> On doing explain I see there are mainly two stage.
>
>    1. MR stage (8 mappers and 10 reducers)
>    2. Move Stage
>
> MR stage is completing in 15-20 mins.
>
> Move stage is taking about *3hours*.
>
> On looking further I found, reducers are writing to a temporary location
> then in move stage it's moved to target location. Move from temp to target
> is happening sequentially. And since I have 192 partitions and 10 reducers.
> It's taking 3 hours to move all the files.
>
> Is there a way to do move in parallel?
>
> Hive Version: 1.2.1
>
>
>

Mime
View raw message