hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gunther Hagleitner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8498) Insert into table misses some rows when vectorization is enabled
Date Sun, 19 Oct 2014 22:48:33 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176475#comment-14176475
] 

Gunther Hagleitner commented on HIVE-8498:
------------------------------------------

When I did the vectorized dynamic pruning stuff there was no problem with vectorization. It
seems that the multi-child case is at least partially working. Do we know why the multi insert
case is failing? The fix might not be that difficult. Is it?

I can see how correlation optimizer might be more tricky. That one produces diamond shapes
in the plan as far as I remember.

> Insert into table misses some rows when vectorization is enabled
> ----------------------------------------------------------------
>
>                 Key: HIVE-8498
>                 URL: https://issues.apache.org/jira/browse/HIVE-8498
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 0.14.0, 0.13.1
>            Reporter: Prasanth J
>            Assignee: Matt McCline
>            Priority: Critical
>              Labels: vectorization
>         Attachments: HIVE-8498.01.patch, HIVE-8498.02.patch
>
>
>  Following is a small reproducible case for the issue
> create table orc1
>   stored as orc
>   tblproperties("orc.compress"="ZLIB")
>   as
>     select rn
>     from
>     (
>       select cast(1 as int) as rn from src limit 1
>       union all
>       select cast(100 as int) as rn from src limit 1
>       union all
>       select cast(10000 as int) as rn from src limit 1
>     ) t;
> create table orc_rn1 (rn int);
> create table orc_rn2 (rn int);
> create table orc_rn3 (rn int);
> // These inserts should produce 3 rows but only 1 row is produced
> from orc1 a
> insert overwrite table orc_rn1 select a.* where a.rn < 100
> insert overwrite table orc_rn2 select a.* where a.rn >= 100 and a.rn < 1000
> insert overwrite table orc_rn3 select a.* where a.rn >= 1000;
> select * from orc_rn1
> union all
> select * from orc_rn2
> union all
> select * from orc_rn3;
> The expected output of the query is
> 1
> 100
> 10000
> But with vectorization enabled we get
> 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message