hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wil - <wil.sh_...@yahoo.com>
Subject transform with distribute by bug?
Date Thu, 02 Oct 2014 19:35:55 GMT
Hi,

I was having an issue with a query that involves a transform with a distribute by.  The query
works in the Hive 0.11 but not in Hive 0.13 (both on EMR).

shell> echo -e "a\tb\tc" > /tmp/a.txt

create table tbl (
  a string,
  b string,
  c string
)
row format delimited
  fields terminated by '\t'
;

load data local inpath '/tmp/a.txt' overwrite into table tbl;

select transform(a, b)
  using 'python foo.py' as (y, z)
from (
  select a, b, c
  from tbl
  distribute by c
) tmp
;


Error:
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException:
cannot find field _col2 from [0:_col0, 1:_col1]
> Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0, 1:_col1]


However, if works if I add sort by into the distribution:

select transform(a, b)
  using 'python foo.py' as (y, z)
from (
  select a, b, c
  from tbl
  distribute by c sort by c
) tmp
;



Is this a valid behavior or a bug?


Hive 0.13 plan (non-working query):

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: tbl
            Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: a (type: string), b (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
              Reduce Output Operator
                sort order:
                Map-reduce partition columns: _col2 (type: string)
                Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
                value expressions: _col0 (type: string), _col1 (type: string)
      Reduce Operator Tree:
        Extract
          Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
          Select Operator
            expressions: _col0 (type: string), _col1 (type: string)
            outputColumnNames: _col0, _col1
            Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
            Transform Operator
              command: python foo.py
              output info:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1


Hive 0.13 plan (working query):

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: tbl
            Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: a (type: string), b (type: string), c (type: string)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
              Reduce Output Operator
                key expressions: _col2 (type: string)
                sort order: +
                Map-reduce partition columns: _col2 (type: string)
                Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
                value expressions: _col0 (type: string), _col1 (type: string)
      Reduce Operator Tree:
        Extract
          Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
          Select Operator
            expressions: _col0 (type: string), _col1 (type: string)
            outputColumnNames: _col0, _col1
            Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
            Transform Operator
              command: python foo.py
              output info:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 0 Data size: 6 Basic stats: PARTIAL Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1


Hive 0.11 plan:

ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME tbl)))
(TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL
a)) (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_TABLE_OR_COL c))) (TOK_DISTRIBUTEBY
(TOK_TABLE_OR_COL c)))) tmp)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT
(TOK_SELEXPR (TOK_TRANSFORM (TOK_EXPLIST (TOK_TABLE_OR_COL a) (TOK_TABLE_OR_COL b)) TOK_SERDE
TOK_RECORDWRITER 'python foo.py' TOK_SERDE TOK_RECORDREADER (TOK_ALIASLIST y z))))))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        tmp:tbl
          TableScan
            alias: tbl
            Select Operator
              expressions:
                    expr: a
                    type: string
                    expr: b
                    type: string
                    expr: c
                    type: string
              outputColumnNames: _col0, _col1, _col2
              Reduce Output Operator
                sort order:
                Map-reduce partition columns:
                      expr: _col2
                      type: string
                tag: -1
                value expressions:
                      expr: _col0
                      type: string
                      expr: _col1
                      type: string
                      expr: _col2
                      type: string
      Reduce Operator Tree:
        Extract
          Select Operator
            expressions:
                  expr: _col0
                  type: string
                  expr: _col1
                  type: string
            outputColumnNames: _col0, _col1
            Transform Operator
              command: python foo.py
              output info:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              File Output Operator
                compressed: false
                GlobalTableId: 0
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Fetch Operator
      limit: -1

Mime
View raw message