Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (nike.apache.org: domain of anytek88@gmail.com designates
 209.85.213.181 as permitted sender)
MIME-Version: 1.0
Date: Thu, 19 Feb 2015 10:47:27 +0100
Message-ID: 
 <CANYW+JW70jq0jb=dJKjSTd9qNRsqkQKdeiU7XQ_OA55-s46H8A@mail.gmail.com>
Subject: Hive on tez - fix number of tasks
From: "Fabio C." <anytek88@gmail.com>
To: user@hive.apache.org
Content-Type: multipart/alternative; boundary=20cf304352c8e778bb050f6dd2a1

--20cf304352c8e778bb050f6dd2a1
Content-Type: text/plain; charset=UTF-8

Hi everyone,
I see that Hive on Tez dynamically chooses the number of tasks to launch
for each vertex in the generated DAG according to cluster load (other than
data size).
For research purposes I'd like to avoid this feature since I need every
query (running on the same datasets) to be executed with the same number of
tasks, regardless of the state of the cluster (if I run query X, n tasks
have to be allocated in any case).
At this point I can't make tests with heavy workloads, so I want to ask you
if you think setting tez.am.grouping.min-size and tez.am.grouping.max-size
to the same value can do the trick, or if you have any better suggestion to
achieve this behavior.
Other than this feature, is there anything else that could change the
number of splits across different runs of the same query?

Thanks a lot

Fabio

--20cf304352c8e778bb050f6dd2a1
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi everyone,<br>I see that Hive on Tez dynamically chooses=
 the number of tasks to launch for each vertex in the generated DAG accordi=
ng to cluster load (other than data size).<br>For research purposes I&#39;d=
 like to avoid this feature since I need every query (running on the same d=
atasets) to be executed with the same number of tasks, regardless of the st=
ate of the cluster (if I run query X, n tasks have to be allocated in any c=
ase).<br>At this point I can&#39;t make tests with heavy workloads, so I wa=
nt to ask you if you think setting tez.am.grouping.min-size and tez.am.grou=
ping.max-size to the same value can do the trick, or if you have any better=
 suggestion to achieve this behavior.<br>Other than this feature, is there =
anything else that could change the number of splits across different runs =
of the same query?<br><br>Thanks a lot<br><br>Fabio<br><br></div>

--20cf304352c8e778bb050f6dd2a1--