hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-626) Typecast bug in Join operator
Date Fri, 10 Jul 2009 23:14:14 GMT

     [ https://issues.apache.org/jira/browse/HIVE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zheng Shao updated HIVE-626:
----------------------------

    Attachment: HIVE-626.1.showinfo.patch

I added some instrumentation to the code (see HIVE-626.1.showinfo.patch)  The result of "explain
extended" (below) shows that the order of the output column of the JoinOperator does not match
that of the FileSinkOperator:

{code}
hive> explain extended
    > select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join zshao_bar on
zshao_foo.foo_id =
    > zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id;
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF zshao_foo) (TOK_TABREF zshao_bar) (=
(. (TOK_TABLE_OR_COL zshao_foo) foo_id) (. (TOK_TABLE_OR_COL zshao_bar) foo_id))) (TOK_TABREF
zshao_count) (= (. (TOK_TABLE_OR_COL zshao_count) bar_id) (. (TOK_TABLE_OR_COL zshao_bar)
bar_id)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (.
(TOK_TABLE_OR_COL zshao_foo) foo_name)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL zshao_bar) bar_name))
(TOK_SELEXPR (TOK_TABLE_OR_COL n)))))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
...
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
          condition expressions:
            0 {VALUE._col1}
            1 {VALUE._col0} {VALUE._col4}
          output names: _col1, _col6, _col10
          File Output Operator
            compressed: true
            GlobalTableId: 0
            directory: hdfs://xxx:9000/tmp/hive-zshao/1413634235/10002
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                properties:
                  name binary_table
                  serialization.ddl struct binary_table { string _col1, string _col10, i32
_col6}
                  serialization.format com.facebook.thrift.protocol.TBinaryProtocol
                name: binary_table

  Stage: Stage-2
    Map Reduce
      Alias -> Map Operator Tree:
        $INTNAME
...

{code}


The output of the join has the order: output names: _col1, _col6, _col10
The FileSinkOperator expects: struct binary_table { string _col1, string _col10, i32 _col6}


> Typecast bug in Join operator
> -----------------------------
>
>                 Key: HIVE-626
>                 URL: https://issues.apache.org/jira/browse/HIVE-626
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>         Attachments: HIVE-626.1.showinfo.patch
>
>
> There is a type cast error in Join operator. Produced by the following steps:
> {code}
> create table zshao_foo (foo_id int, foo_name string, foo_a string, foo_b string,
> foo_c string, foo_d string) row format delimited fields terminated by ','
> stored as textfile;
> create table zshao_bar (bar_id int, bar_0 int, foo_id int, bar_1 int, bar_name
> string, bar_a string, bar_b string, bar_c string, bar_d string) row format
> delimited fields terminated by ',' stored as textfile;
> create table zshao_count (bar_id int, n int) row format delimited fields
> terminated by ',' stored as textfile;
> Each table has a single row as follows:
> zshao_foo:
> 1,foo1,a,b,c,d
> zshao_bar:
> 10,0,1,1,bar10,a,b,c,d
> zshao_count:
> 10,2
> load data local inpath 'zshao_foo' overwrite into table zshao_foo;
> load data local inpath 'zshao_bar' overwrite into table zshao_bar;
> load data local inpath 'zshao_count' overwrite into table zshao_count;
> explain extended
> select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join zshao_bar on zshao_foo.foo_id
=
> zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id;
> {code}
> The case is from David Lerman.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message