hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sreekanth Ramakrishnan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1695) MapJoin followed by ReduceSink should be done as single MapReduce Job
Date Thu, 02 Dec 2010 10:06:11 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966046#action_12966046
] 

Sreekanth Ramakrishnan commented on HIVE-1695:
----------------------------------------------

Group By operator Plan:

{noformat}
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF test a) (TOK_TABREF test1 b) (= (. (TOK_TABLE_OR_COL
a) key) (. (TOK_TABLE_OR_COL b) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE))
(TOK_SELECT (TOK_HINTLIST (TOK_HINT TOK_MAPJOIN (TOK_HINTARGLIST b))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL
a) key))) (TOK_GROUPBY (. (TOK_TABLE_OR_COL a) key))))

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-1 depends on stages: Stage-4
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        b 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        b 
          TableScan
            alias: b
            HashTable Sink Operator
              condition expressions:
                0 {key}
                1 
              handleSkewJoin: false
              keys:
                0 [Column[key]]
                1 [Column[key]]
              Position of Big Table: 0

  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        a 
          TableScan
            alias: a
            Map Join Operator
              condition map:
                   Inner Join 0 to 1
              condition expressions:
                0 {key}
                1 
              handleSkewJoin: false
              keys:
                0 [Column[key]]
                1 [Column[key]]
              outputColumnNames: _col0
              Position of Big Table: 0
              Reduce Output Operator
                key expressions:
                      expr: _col0
                      type: int
                sort order: +
                Map-reduce partition columns:
                      expr: _col0
                      type: int
                tag: -1
      Local Work:
        Map Reduce Local Work
      Reduce Operator Tree:
        Group By Operator
          bucketGroup: false
          keys:
                expr: KEY._col0
                type: int
          mode: mergepartial
          outputColumnNames: _col0
          Select Operator
            expressions:
                  expr: _col0
                  type: int
            outputColumnNames: _col0
            File Output Operator
              compressed: false
              GlobalTableId: 0
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Fetch Operator
      limit: -1
{noformat}

I have successfully got group by and order by working with new NodeProcessor Which I implemented,
cross checked it with the results from before and after the plan alterations were done.

> MapJoin followed by ReduceSink should be done as single MapReduce Job
> ---------------------------------------------------------------------
>
>                 Key: HIVE-1695
>                 URL: https://issues.apache.org/jira/browse/HIVE-1695
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>
> Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map only job
followed by a Map-Reduce job. It can be combined into single MapReduce Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message