Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 37421102EC for ; Thu, 19 Feb 2015 09:48:45 +0000 (UTC) Received: (qmail 99101 invoked by uid 500); 19 Feb 2015 09:48:40 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 99029 invoked by uid 500); 19 Feb 2015 09:48:40 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 99019 invoked by uid 99); 19 Feb 2015 09:48:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Feb 2015 09:48:40 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of anytek88@gmail.com designates 209.85.213.181 as permitted sender) Received: from [209.85.213.181] (HELO mail-ig0-f181.google.com) (209.85.213.181) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Feb 2015 09:48:14 +0000 Received: by mail-ig0-f181.google.com with SMTP id hn18so8009452igb.2 for ; Thu, 19 Feb 2015 01:47:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=plZu/YHumcoHgRgdJuCB1O/i9NaG6774vKkk7WdRmfA=; b=0YJVdDGTyp+0IoeH9pHkGPckYtyshrKGwdCbOzgi88nJ59FGuwVl9MbMl7i9wYPPIk p+ArX9tm1Uz0wkXKQpltRC462V9v9lS3m0O+kKDhuXXagbQO/u5Od/pFQPBHccF4Irzk h7nstDoPCwweXmT360xu72MpYJbBYPMx1J5Fr4QS8Z+LBMiqXVBXYXhtyUoNsmqKcUqU zoKoDZCw7bh3Cy/ydOCNY6MA3PZ6mbYWgiL6zLtAAthZS4oZhwO9PBu63HlJr1stqdzX RkDSZUCOng1wNkX0NOV7o9lxpyJRCjyX95ybpQVK0PdzWAuS21bsOGB2lX/qlyliBMn2 0vPQ== MIME-Version: 1.0 X-Received: by 10.42.106.203 with SMTP id a11mr5068809icp.2.1424339247785; Thu, 19 Feb 2015 01:47:27 -0800 (PST) Received: by 10.36.37.1 with HTTP; Thu, 19 Feb 2015 01:47:27 -0800 (PST) Date: Thu, 19 Feb 2015 10:47:27 +0100 Message-ID: Subject: Hive on tez - fix number of tasks From: "Fabio C." To: user@hive.apache.org Content-Type: multipart/alternative; boundary=20cf304352c8e778bb050f6dd2a1 X-Virus-Checked: Checked by ClamAV on apache.org --20cf304352c8e778bb050f6dd2a1 Content-Type: text/plain; charset=UTF-8 Hi everyone, I see that Hive on Tez dynamically chooses the number of tasks to launch for each vertex in the generated DAG according to cluster load (other than data size). For research purposes I'd like to avoid this feature since I need every query (running on the same datasets) to be executed with the same number of tasks, regardless of the state of the cluster (if I run query X, n tasks have to be allocated in any case). At this point I can't make tests with heavy workloads, so I want to ask you if you think setting tez.am.grouping.min-size and tez.am.grouping.max-size to the same value can do the trick, or if you have any better suggestion to achieve this behavior. Other than this feature, is there anything else that could change the number of splits across different runs of the same query? Thanks a lot Fabio --20cf304352c8e778bb050f6dd2a1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi everyone,
I see that Hive on Tez dynamically chooses= the number of tasks to launch for each vertex in the generated DAG accordi= ng to cluster load (other than data size).
For research purposes I'd= like to avoid this feature since I need every query (running on the same d= atasets) to be executed with the same number of tasks, regardless of the st= ate of the cluster (if I run query X, n tasks have to be allocated in any c= ase).
At this point I can't make tests with heavy workloads, so I wa= nt to ask you if you think setting tez.am.grouping.min-size and tez.am.grou= ping.max-size to the same value can do the trick, or if you have any better= suggestion to achieve this behavior.
Other than this feature, is there = anything else that could change the number of splits across different runs = of the same query?

Thanks a lot

Fabio

--20cf304352c8e778bb050f6dd2a1--