Return-Path: Delivered-To: apmail-hadoop-core-commits-archive@www.apache.org Received: (qmail 12324 invoked from network); 21 Jan 2009 23:33:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Jan 2009 23:33:57 -0000 Received: (qmail 64483 invoked by uid 500); 21 Jan 2009 23:33:56 -0000 Delivered-To: apmail-hadoop-core-commits-archive@hadoop.apache.org Received: (qmail 64459 invoked by uid 500); 21 Jan 2009 23:33:56 -0000 Mailing-List: contact core-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-commits@hadoop.apache.org Received: (qmail 64450 invoked by uid 99); 21 Jan 2009 23:33:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jan 2009 15:33:56 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jan 2009 23:33:54 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 8784911150 for ; Wed, 21 Jan 2009 23:33:33 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: core-commits@hadoop.apache.org Date: Wed, 21 Jan 2009 23:33:33 -0000 Message-ID: <20090121233333.14570.47430@eos.apache.org> Subject: [Hadoop Wiki] Trivial Update of "Hive/LanguageManual/LanguageManual/Explain" by AshishThusoo X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The following page has been changed by AshishThusoo: http://wiki.apache.org/hadoop/Hive/LanguageManual/LanguageManual/Explain New page: == EXPLAIN Syntax == Hive provides an EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows: {{{ EXPLAIN [EXTENDED] query }}} The use of EXTENDED in the EXPLAIN statement produces extra information about the operators in the plan. This is typically physical information like file names. A Hive query gets converted into a sequence (it is more an Directed Acyclic Graph) of stages. These stages may be map/reduce stages or they may even be stages that do metastore or file system operations like move and rename. The explain output comprises of three parts: * The Abstract Syntax Tree for the query * The dependencies between the different stages of the plan * The description of each of the stages The description of the stages itself shows a sequence of operators with the metadata associated with the operators. The metadata may comprise of things like filter expressions for the FilterOperator or the select expressions for the SelectOperator or the output file names for the FileSinkOperator. As an example, consider the following EXPLAIN query: {{{ EXPLAIN FROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value,4)) GROUP BY src.key; }}} The output of this statement contains the following parts: * The Abstract Syntax Tree {{{ ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF src)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB dest_g1)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF src key)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_FUNCTION substr (TOK_COLREF src value) 4)))) (TOK_GROUPBY (TOK_COLREF src key)))) }}} * The Dependency Graph {{{ STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 depends on stages: Stage-2 }}} This shows that Stage-1 is the root stage, Stage-2 is executed after Stage-1 is done and Stage-0 is executed after Stage-2 is done. * The plans of each Stage {{{ STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: src Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: rand() type: double tag: -1 value expressions: expr: substr(value, 4) type: string Reduce Operator Tree: Group By Operator aggregations: expr: sum(UDFToDouble(VALUE.0)) keys: expr: KEY.0 type: string mode: partial1 File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.mapred.SequenceFileOutputFormat name: binary_table Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: /tmp/hive-zshao/67494501/106593589.10001 Reduce Output Operator key expressions: expr: 0 type: string sort order: + Map-reduce partition columns: expr: 0 type: string tag: -1 value expressions: expr: 1 type: double Reduce Operator Tree: Group By Operator aggregations: expr: sum(VALUE.0) keys: expr: KEY.0 type: string mode: final Select Operator expressions: expr: 0 type: string expr: 1 type: double Select Operator expressions: expr: UDFToInteger(0) type: int expr: 1 type: double File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe name: dest_g1 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe name: dest_g1 }}} In this example there are 2 map/reduce stages (Stage-1 and Stage-2) and 1 File System related stage (Stage-0). Stage-0 basically moves the results from a temporary directory to the directory corresponding to the table dest_g1. A map/reduce stage itself comprises of 2 parts: * A mapping from table alias to Map Operator Tree - This mapping tells the mappers which operator tree to call in order to process the rows from a particular table or result of a previous map/reduce stage. In Stage-1 in the above example, the rows from src table are processed by the operator tree rooted at a Reduce Output Operator. Similarly, in Stage-2 the rows of the results of Stage-1 are processed by another operator tree rooted at another Reduce Output Operator. Each of these Reduce Output Operators partitions the data to the reducers according to the criteria shown in the metadata. * A Reduce Operator Tree - This is the operator tree which processes all the rows on the reducer of the map/reduce job. In Stage-1 for example, the Reducer Operator Tree is carrying out a partial aggregation where as the Reducer Operator Tree in Stage-2 computes the final aggregation from the partial aggregates computed in Stage-1