hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <>
Subject Re: Hive query on Tez slower than on MR (fails in some cases) ..
Date Fri, 19 Feb 2016 06:34:17 GMT

> On Tez, this is run as a single DAG of M-R+ ...

Can't tell which vertex is the slow one in this.

More tooling for isolating which vertex is taking up time (and which task)

or alternatively run

The first one should get you a graph which a lot like

and the 2nd one should get you something which looks like
(note skewed tail in Reducer 3)

> It gets stuck due to some large apps in the 1st Reducer Phase while
>holding all subsequent 12 Reducer phases until the final Reducer in the
>2nd phase is finished.

You're splitting the sort buffers 12-way.

> Are there things in Tez I can leverage or change my query to make it
>conducive for Tez to deal with skew better?

Usually Tez runs all containers using the Mapper Xmx values, if left
unconfigured. Most of the times the perf diff is reported, it's due to the
use of 1.5Gb containers (and 6Gb reducers in MRv2).
Assuming that isn't the case, get the other SVGs produced - should tell me
exactly what's wrong.

Tez doesn't introduce skews in general, but the impact of dividing
io.sort.mb into 12 chunks might be a problem.

PS: in 0.8.2, the tooling actually gets you something like -

View raw message