hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aihua Xu (JIRA)" <>
Subject [jira] [Commented] (HIVE-16291) Hive fails when unions a parquet table with itself
Date Mon, 03 Apr 2017 19:11:41 GMT


Aihua Xu commented on HIVE-16291:

[~ashutoshc] The problem is not to set READ_ALL_COLUMNS to false, but when ids is empty and
the old = "0", the newConfStr will become ",0". 

    if (old != null && !old.isEmpty()) {
      newConfStr = newConfStr + StringUtils.COMMA_STR + old;

So when id is empty, we just need to set READ_ALL_COLUMNS to false.

> Hive fails when unions a parquet table with itself
> --------------------------------------------------
>                 Key: HIVE-16291
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Yibing Shi
>            Assignee: Yibing Shi
>         Attachments: HIVE-16291.1.patch
> Reproduce commands:
> {code:sql}
> create table tst_unin (col1 int) partitioned by (p_tdate int) stored as parquet;
> insert into tst_unin partition (p_tdate=201603) values (20160312), (20160310);
> insert into tst_unin partition (p_tdate=201604) values (20160412), (20160410);
> select count(*) from (select tst_unin.p_tdate from tst_unin where tst_unin.col1=20160302
union all select tst_unin.p_tdate from tst_unin) t1;
> {code}
> The table is stored in Parquet format, which is a columnar file format. Hive tries to
push the query predicates to the table scan operators so that only the needed columns are
read. This is done by adding the needed column IDs into job configuration with property "".
> In above case, the query unions the result of 2 subqueries, which select data from one
same table. The first subquery doesn't need any column from Parquet file, while the second
subquery needs a column "col1". Hive has a bug here, it finally set ""
to a value like "0,,0", which method ColumnProjectionUtils.getReadColumnIDs cannot parse.

This message was sent by Atlassian JIRA

View raw message