Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 248C818DC4 for ; Fri, 11 Dec 2015 22:55:54 +0000 (UTC) Received: (qmail 1171 invoked by uid 500); 11 Dec 2015 22:55:52 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 1100 invoked by uid 500); 11 Dec 2015 22:55:52 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 1090 invoked by uid 99); 11 Dec 2015 22:55:52 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Dec 2015 22:55:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id DE0D41A23A2 for ; Fri, 11 Dec 2015 22:55:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.682 X-Spam-Level: X-Spam-Status: No, score=0.682 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, KHOP_DYNAMIC=0.781, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=rallyhealth.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id aB1WWC_363jg for ; Fri, 11 Dec 2015 22:55:41 +0000 (UTC) Received: from mx0a-001a6401.pphosted.com (mx0a-001a6401.pphosted.com [208.84.65.155]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 3DDEF429D0 for ; Fri, 11 Dec 2015 22:55:41 +0000 (UTC) Received: from pps.filterd (m0076088.ppops.net [127.0.0.1]) by mx0a-001a6401.pphosted.com (8.15.0.59/8.15.0.59) with SMTP id tBBMtXap017027 for ; Fri, 11 Dec 2015 14:55:33 -0800 Received: from mail-oi0-f51.google.com (mail-oi0-f51.google.com [209.85.218.51]) by mx0a-001a6401.pphosted.com with ESMTP id 1yqk70h2ek-1 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 11 Dec 2015 14:55:33 -0800 Received: by oiww189 with SMTP id w189so70881771oiw.3 for ; Fri, 11 Dec 2015 14:55:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rallyhealth.com; s=rally; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=jUC/LMYPPpZW+Nxv/Tz5GViTvwCYA709vQJ6o2fBvn8=; b=XFaikBVTLiTT0Yr4Q6Le1FFcB+Lzzz3l1/l5jQUes7K64jGSxEGvb5iQ0z8VdZbBTz N3bPMtMdlTMQf6jCj/hm591igDtB/VZv3VDzTKdztvvWIFBnr9OaVAWKdFyBqZVHybBZ vLU9J06IQFBOB3HRcFdYMNZB/qcbsnyn5L6jk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=jUC/LMYPPpZW+Nxv/Tz5GViTvwCYA709vQJ6o2fBvn8=; b=TiUfXcKNd2TksZBXZuVl45vITtLms0WQ4XybZnZdr/JCu29RROrB0O2aj01TZJze6x U3iZE0mWzsyWY5yIWsKtrZJ0fWX9D3XoL44WYqBOl8ZdXW01By+YDmKjbqbNaX7TH+X6 rbCuJXK4VNYA319nn/GNzcxYfj53I68DAwcL22uwpgCqL9dSty9th38OveikMHmkThib zZW77Bjiy/irwVIdSOOjufabwl84CBWXuzOVY0Lq6Y7eHl7yq2jBoxRE4G46RGZO5kuF gQcRBqtSJvGtz2l0oGlsC4MSZW1EM7Y/WGhd+ZmJcKWzTKxjhzEMzPQr56SlekkHbSZc z6yw== X-Gm-Message-State: ALoCoQlfFQGFFusi8u3lwM4tYqWZxQ4Ge24ElX6MWH02amXiFbixiR8ClY8s+BGugDs5UdKqXcZHmvFrn4Rd6uJeMfwbXaaulI71inRg4/Os6V0uxyqipnxLTE2prQnZnMDOldzkELOfvUe/fIif52qkirNb3gcYXqj3eBbqH6xYQIXaz5MCdO9VufM8PkauC/I3VL0LuW6Y X-Received: by 10.202.75.1 with SMTP id y1mr16043140oia.42.1449874531552; Fri, 11 Dec 2015 14:55:31 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.202.75.1 with SMTP id y1mr16043127oia.42.1449874531368; Fri, 11 Dec 2015 14:55:31 -0800 (PST) Received: by 10.76.0.201 with HTTP; Fri, 11 Dec 2015 14:55:31 -0800 (PST) In-Reply-To: References: Date: Fri, 11 Dec 2015 14:55:31 -0800 Message-ID: Subject: Re: trying to figure out number of MR jobs from explain output From: Nicholas Hakobian To: user@hive.apache.org Content-Type: text/plain; charset=UTF-8 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2015-12-11_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 kscore.is_bulkscore=0 kscore.compositescore=1 compositescore=0.9 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 kscore.is_spamscore=0 rbsscore=0.9 spamscore=0 urlsuspectscore=0.9 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1507310007 definitions=main-1512110415 You can't find out definitively because it is going to depend on the nature of the data being processed, especially when it comes to mapjoins. If the output of one stage is small enough for it to mapjoin, parts of a stage can be skipped as the whole dataset is on every node. I'm sure there are other conditions as well, but that is general idea. -Nick Nicholas Szandor Hakobian Data Scientist Rally Health nicholas.hakobian@rallyhealth.com On Fri, Dec 11, 2015 at 2:00 PM, Ophir Etzion wrote: > Hi, > > I've been trying to figure out how to know the number of MR jobs that will > be ran for a hive query using the EXPLAIN output. > > I haven't got to a consistent method to knowing that. > > for example (in one of my queries, ctas query): > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5 > Stage-4 > Stage-0 depends on stages: Stage-4, Stage-3, Stage-6 > Stage-8 depends on stages: Stage-0 > Stage-2 depends on stages: Stage-8 > Stage-3 > Stage-5 > Stage-6 depends on stages: Stage-5 > > Stage-1, Stage-3, Stage-5 are listed as map reduce steps. > > eventually 2 MR jobs ran. > > in other cases only 1 job runs. > > I couldn't find a consistent rule on how to figure this out. > > can anyone help?? > > Thank you!! > > below is full output > > explain CREATE TABLE beekeeper_results.test3 ROW FORMAT SERDE > "com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde" WITH > SERDEPROPERTIES ('escape.delim'='\\', 'mapkey.delim'='\;', > 'colelction.delim'='|') AS SELECT * FROM beekeeper_results.test2; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5 > Stage-4 > Stage-0 depends on stages: Stage-4, Stage-3, Stage-6 > Stage-8 depends on stages: Stage-0 > Stage-2 depends on stages: Stage-8 > Stage-3 > Stage-5 > Stage-6 depends on stages: Stage-5 > > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Map Operator Tree: > TableScan > alias: test2 > Statistics: Num rows: 112 Data size: 11690 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: blasttag (type: string), actioncounts (type: > array>), detailedclicks (type: > array>), countsbyclient > (type: array>), > totalactioncounts (type: array>), > actionsbydate (type: > array>) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 > Statistics: Num rows: 112 Data size: 11690 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 112 Data size: 11690 Basic stats: > COMPLETE Column stats: NONE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde > name: beekeeper_results.test3 > > Stage: Stage-7 > Conditional Operator > > Stage: Stage-4 > Move Operator > files: > hdfs directory: true > destination: > hdfs://hadoop-alidoro-nn-vip/user/hive/warehouse/.hive-staging_hive_2015-12-11_21-52-35_063_8498858370292854265-1/-ext-10001 > > Stage: Stage-0 > Move Operator > files: > hdfs directory: true > destination: *** > > Stage: Stage-8 > Create Table Operator: > Create Table > columns: blasttag string, actioncounts > array>, detailedclicks > array>, countsbyclient > array>, totalactioncounts > array>, actionsbydate > array> > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat > serde name: > com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde > serde properties: > colelction.delim | > escape.delim \ > mapkey.delim ; > name: beekeeper_results.test3 > > Stage: Stage-2 > Stats-Aggr Operator > > Stage: Stage-3 > Map Reduce > Map Operator Tree: > TableScan > File Output Operator > compressed: false > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde > name: beekeeper_results.test3 > > Stage: Stage-5 > Map Reduce > Map Operator Tree: > TableScan > File Output Operator > compressed: false > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > com.foursquare.hadoop.hive.serde.lazycsv.LazySimpleCSVSerde > name: beekeeper_results.test3 > > Stage: Stage-6 > Move Operator > files: > hdfs directory: true > destination: *** >