hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amruth S (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-14741) Incorrect results on boolean col when vectorization is ON
Date Tue, 13 Sep 2016 12:02:20 GMT
Amruth S created HIVE-14741:
-------------------------------

             Summary: Incorrect results on boolean col when vectorization is ON
                 Key: HIVE-14741
                 URL: https://issues.apache.org/jira/browse/HIVE-14741
             Project: Hive
          Issue Type: Bug
    Affects Versions: 2.1.0, 2.0.0
            Reporter: Amruth S


I have attached the ORC part file on which the issue is manifesting. It has just one boolean
column (lot of nulls, 231=trues : verified using orc file dump utility)

1) Create external table on the part file attached

CREATE EXTERNAL TABLE bool_vect_issue (
`bool_col` BOOLEAN)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'<loc to which the part file is copied>';

2) 
set hive.vectorized.execution.enabled = true;
select sum(if((bool_col) , 1, 0)) from bool_vect_issue;
gives
708206

3) 
set hive.vectorized.execution.enabled = false;
select sum(if((bool_col) , 1, 0)) from bool_vect_issue;
gives
231

The issue seem to have the same impact as HIVE-12435







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message