hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wzc <wzc1...@gmail.com>
Subject Re: orc ppd bug report
Date Tue, 06 Jan 2015 05:59:55 GMT
@Prasanth would you help me look into this problem?

Thanks.

On Mon Jan 05 2015 at 上午12:03:42 wzc <wzc1989@gmail.com> wrote:

> Recently we find a bug with orc ppd,  here is the testcase:
>
> use test;
> create table if not exists test_orc_src (a int, b int, c int)
> stored as orc;
> create table if not exists test_orc_src2 (a int, b int, d int)
> stored as orc;
> insert overwrite table test_orc_src select 1,2,3 from dim.city
> limit 1;
> insert overwrite table test_orc_src2 select 1,2,4 from dim.city
> limit 1;
> set hive.auto.convert.join = false;
> select
>   tb.c
> from test.test_orc_src tb
> join test.test_orc_src2 tm
> on tb.a = tm.awhere tb.b = 2
>
> The correct answer for the above query is 3, while it returns empty.We
> find that orc ppd use READ_COLUMN_NAMES_CONF_STR property to get the
> required column list, it's not well constructed when there exists some
> table whose storage path is prefix of some other table path. This bug is
> relate to HIVE-1903 <https://issues.apache.org/jira/browse/HIVE-1903> ,
> IN HiveInputFormat#pushProjectionsAndFilters it use prefix match for to
> get all alias associated with the given path, which I think is not very
> suitable.  I dont know why we shall do prefix match here instead of equal
> match.
> Any help is appreciated.
>
>
>
>
>
>
>
>

Mime
View raw message