hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes Alkjær (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HIVE-4598) Incorrect results when using subquery in multi table insert
Date Mon, 30 Sep 2013 22:00:27 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782314#comment-13782314
] 

Johannes Alkjær commented on HIVE-4598:
---------------------------------------

Adding an extra select block, fixes the execution plan though,
{code}
EXPLAIN
FROM (
    SELECT * FROM ( 
         FROM ( SELECT * FROM sample ) mapout 
         REDUCE * USING 'cat' AS x,y
    ) reduced
) zz
insert overwrite local directory '/tmp/a' select * where x='a' or x='b'
insert overwrite local directory '/tmp/b' select * where x='c' or x='d';
{code}

{code}
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM
(TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME sample))) (TOK_INSERT (TOK_DESTINATION
(TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)))) mapout)) (TOK_INSERT (TOK_DESTINATION
(TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TRANSFORM (TOK_EXPLIST TOK_ALLCOLREF)
TOK_SERDE TOK_RECORDWRITER 'cat' TOK_SERDE TOK_RECORDREADER (TOK_ALIASLIST x y)))))) reduced))
(TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF))))
zz)) (TOK_INSERT (TOK_DESTINATION (TOK_LOCAL_DIR '/tmp/a')) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF))
(TOK_WHERE (or (= (TOK_TABLE_OR_COL x) 'a') (= (TOK_TABLE_OR_COL x) 'b')))) (TOK_INSERT (TOK_DESTINATION
(TOK_LOCAL_DIR '/tmp/b')) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (= (TOK_TABLE_OR_COL
x) 'c') (= (TOK_TABLE_OR_COL x) 'd')))))

STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-0 depends on stages: Stage-2
  Stage-1 depends on stages: Stage-2

STAGE PLANS:
  Stage: Stage-2
    Map Reduce
      Alias -> Map Operator Tree:
        zz:reduced:mapout:sample 
          TableScan
            alias: sample
            Select Operator
              expressions:
                    expr: key
                    type: string
                    expr: val
                    type: string
              outputColumnNames: _col0, _col1
              Transform Operator
                command: cat
                output info:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                Select Operator
                  expressions:
                        expr: _col0
                        type: string
                        expr: _col1
                        type: string
                  outputColumnNames: _col0, _col1
                  Filter Operator
                    predicate:
                        expr: ((_col0 = 'a') or (_col0 = 'b'))
                        type: boolean
                    Select Operator
                      expressions:
                            expr: _col0
                            type: string
                            expr: _col1
                            type: string
                      outputColumnNames: _col0, _col1
                      File Output Operator
                        compressed: false
                        GlobalTableId: 1
                        table:
                            input format: org.apache.hadoop.mapred.TextInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  Filter Operator
                    predicate:
                        expr: ((_col0 = 'c') or (_col0 = 'd'))
                        type: boolean
                    Select Operator
                      expressions:
                            expr: _col0
                            type: string
                            expr: _col1
                            type: string
                      outputColumnNames: _col0, _col1
                      File Output Operator
                        compressed: false
                        GlobalTableId: 2
                        table:
                            input format: org.apache.hadoop.mapred.TextInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Move Operator
      files:
          hdfs directory: false
          destination: /tmp/a

  Stage: Stage-1
    Move Operator
      files:
          hdfs directory: false
          destination: /tmp/b
{code}

> Incorrect results when using subquery in multi table insert
> -----------------------------------------------------------
>
>                 Key: HIVE-4598
>                 URL: https://issues.apache.org/jira/browse/HIVE-4598
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.10.0, 0.11.0
>            Reporter: Sebastian
>
> I'm using a multi table insert like this:
> FROM <x>
> INSERT INTO TABLE t PARTITION (type='x')
> SELECT * WHERE type='x'
> INSERT INTO TABLE t PARTITION (type='y')
> SELECT * WHERE type='y';
> Now when <x> is the name of a table, everything works as expected.
> However if I use a subquery as <x>, the query runs but it inserts all results from
the subquery into each partition, as if there were no "WHERE" clauses in the selects.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message