hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: Hive query on Tez slower than on MR (fails in some cases) ..
Date Fri, 19 Feb 2016 19:39:26 GMT
Hi,

> Here's the Tez DAG swimlane. Haven't gotten vertex.py to work.. will
>send that too soon.

Pretty clear that the map-side is fine - splitting sort buffers isn't
bothering this at all.

We want to over-partition Reducer 7 and possibly have all of them pick the
total # of reducers dynamically

set hive.exec.parallel=false; -- bad idea on Tez

set hive.tez.auto.reducer.parallelism=true; -- decide on total # of
reducers dynamically
set hive.tez.min.partition.factor=0.1;

set hive.tez.max.partition.factor=10;

set tez.shuffle-vertex-manager.min-src-fraction=0.9; -- slow start min
(reducer counts are picked at this point)
set tez.shuffle-vertex-manager.max-src-fraction=0.99;

set tez.runtime.report.partition.stats=true;

(experimental!! - I'm still testing this for machine failure tolerance)

set tez.runtime.pipelined-shuffle.enabled=true;


Cheers,
Gopal



Mime
View raw message