Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 389E211745 for ; Tue, 19 Aug 2014 16:05:19 +0000 (UTC) Received: (qmail 48235 invoked by uid 500); 19 Aug 2014 16:05:18 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 48166 invoked by uid 500); 19 Aug 2014 16:05:18 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 48151 invoked by uid 500); 19 Aug 2014 16:05:18 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 48148 invoked by uid 99); 19 Aug 2014 16:05:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Aug 2014 16:05:18 +0000 Date: Tue, 19 Aug 2014 16:05:18 +0000 (UTC) From: "Hari Sekhon (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-7782) tez default engine not overridden by hive.execution.engine=mr in hive cli session MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sekhon updated HIVE-7782: ------------------------------ Description: I've deployed hive.execution.engine=tez as the default on my secondary HDP cluster I find that hive cli interactive sessions where I do {code} set hive.execution.engine=mr {code} still execute with Tez as shown in the Resource Manager applications view. Now this may make sense since it's connected a Tez session by that point but it's also misleading because the job progress output in the cli changes to look like MapReduce rather than Tez and the query time is increased from 8 to to 15-16 secs but still less than the 25-30+ secs I usually see with MR. The Resource Manager shows both of these jobs as TEZ application type regardless of setting hive.execution.engine=mr. Is this a bug in the way Hive is submitting the job (Tez vs MR) or a bug in the way the RM is reporting it? {code} hive Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties hive> select count(*) from sample_07; Query ID = hari_20140819164848_c03824c7-0e76-4507-b619-6a22cb0fbc4c Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: application_1408444369445_0031) Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 823 Time taken: 8.492 seconds, Fetched: 1 row(s) hive> set hive.execution.engine=mr; hive> select count(*) from sample_07; Query ID = hari_20140819164848_b620d990-b405-479c-be5b-d9616527cefe Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1408444369445_0032, Tracking URL = http://lonsl1101827-data.uk.net.intra:8088/proxy/application_1408444369445_0032/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1408444369445_0032 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2014-08-19 16:48:35,242 Stage-1 map = 0%, reduce = 0% 2014-08-19 16:48:40,539 Stage-1 map = 100%, reduce = 0% 2014-08-19 16:48:44,676 Stage-1 map = 100%, reduce = 100% Ended Job = job_1408444369445_0032 MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 823 Time taken: 16.579 seconds, Fetched: 1 row(s) {code} If I exit hive shell and restart it instead using {code}--hiveconf hive.execution.engine=mr{code} to set before session is established then it does a proper MapReduce job according to RM and it also takes the longer expected 25 secs instead of the 8 in Tez or 15 in trying to do MR instead Tez session. was: I've deployed hive.execution.engine=tez as the default on my secondary HDP cluster I find that hive cli interactive sessions where I do {code} set hive.execution.engine=mr {code} still execute with Tez as shown in the Resource Manager applications view. Now this may make sense since it's connected a Tez session by that point but it's also misleading because the job progress output in the cli changes to look like MapReduce rather than Tez and the query time is increased although only to 15-16 secs rather than the 25-30+ secs I usually see with MR. The Resource Manager shows both of these jobs as TEZ application type. Is this a bug in the way Hive is submitting the job (Tez vs MR) or a bug in the way the RM is reporting it? {code} hive Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties hive> select count(*) from sample_07; Query ID = hari_20140819164848_c03824c7-0e76-4507-b619-6a22cb0fbc4c Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: application_1408444369445_0031) Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 823 Time taken: 8.492 seconds, Fetched: 1 row(s) hive> set hive.execution.engine=mr; hive> select count(*) from sample_07; Query ID = hari_20140819164848_b620d990-b405-479c-be5b-d9616527cefe Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1408444369445_0032, Tracking URL = http://lonsl1101827-data.uk.net.intra:8088/proxy/application_1408444369445_0032/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1408444369445_0032 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2014-08-19 16:48:35,242 Stage-1 map = 0%, reduce = 0% 2014-08-19 16:48:40,539 Stage-1 map = 100%, reduce = 0% 2014-08-19 16:48:44,676 Stage-1 map = 100%, reduce = 100% Ended Job = job_1408444369445_0032 MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 823 Time taken: 16.579 seconds, Fetched: 1 row(s) {code} If I exit hive shell and restart it instead using {code}--hiveconf hive.execution.engine=mr{code} to set before session is established then it does a proper MapReduce job according to RM and it also takes the longer expected 25 secs instead of the 8 in Tez or 15 in trying to do MR instead Tez session. > tez default engine not overridden by hive.execution.engine=mr in hive cli session > --------------------------------------------------------------------------------- > > Key: HIVE-7782 > URL: https://issues.apache.org/jira/browse/HIVE-7782 > Project: Hive > Issue Type: Bug > Components: CLI, Tez > Environment: HDP2.1 > Reporter: Hari Sekhon > Priority: Minor > Labels: cli, hive, tez, yarn > > I've deployed hive.execution.engine=tez as the default on my secondary HDP cluster I find that hive cli interactive sessions where I do > {code} > set hive.execution.engine=mr > {code} > still execute with Tez as shown in the Resource Manager applications view. Now this may make sense since it's connected a Tez session by that point but it's also misleading because the job progress output in the cli changes to look like MapReduce rather than Tez and the query time is increased from 8 to to 15-16 secs but still less than the 25-30+ secs I usually see with MR. The Resource Manager shows both of these jobs as TEZ application type regardless of setting hive.execution.engine=mr. Is this a bug in the way Hive is submitting the job (Tez vs MR) or a bug in the way the RM is reporting it? > {code} > hive > Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties > hive> select count(*) from sample_07; > Query ID = hari_20140819164848_c03824c7-0e76-4507-b619-6a22cb0fbc4c > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (application id: application_1408444369445_0031) > Map 1: -/- Reducer 2: 0/1 > Map 1: 0/1 Reducer 2: 0/1 > Map 1: 0/1 Reducer 2: 0/1 > Map 1: 1/1 Reducer 2: 0/1 > Map 1: 1/1 Reducer 2: 1/1 > Status: Finished successfully > OK > 823 > Time taken: 8.492 seconds, Fetched: 1 row(s) > hive> set hive.execution.engine=mr; > hive> select count(*) from sample_07; > Query ID = hari_20140819164848_b620d990-b405-479c-be5b-d9616527cefe > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapreduce.job.reduces= > Starting Job = job_1408444369445_0032, Tracking URL = http://lonsl1101827-data.uk.net.intra:8088/proxy/application_1408444369445_0032/ > Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1408444369445_0032 > Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 > 2014-08-19 16:48:35,242 Stage-1 map = 0%, reduce = 0% > 2014-08-19 16:48:40,539 Stage-1 map = 100%, reduce = 0% > 2014-08-19 16:48:44,676 Stage-1 map = 100%, reduce = 100% > Ended Job = job_1408444369445_0032 > MapReduce Jobs Launched: > Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > 823 > Time taken: 16.579 seconds, Fetched: 1 row(s) > {code} > If I exit hive shell and restart it instead using {code}--hiveconf hive.execution.engine=mr{code} to set before session is established then it does a proper MapReduce job according to RM and it also takes the longer expected 25 secs instead of the 8 in Tez or 15 in trying to do MR instead Tez session. -- This message was sent by Atlassian JIRA (v6.2#6252)