hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-13330) ORC vectorized string dictionary reader does not differentiate null vs empty string dictionary
Date Tue, 22 Mar 2016 20:54:25 GMT

     [ https://issues.apache.org/jira/browse/HIVE-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prasanth Jayachandran updated HIVE-13330:
-----------------------------------------
    Attachment: HIVE-13330.1.patch

> ORC vectorized string dictionary reader does not differentiate null vs empty string dictionary
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13330
>                 URL: https://issues.apache.org/jira/browse/HIVE-13330
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 2.0.0, 2.1.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>            Priority: Critical
>         Attachments: HIVE-13330.1.patch
>
>
> Vectorized string dictionary reader cannot differentiate between the case where all dictionary
entries are null vs single entry with empty string. This causes wrong results when reading
data out of such files. 
> {code:title=Vectorization On}
> SET hive.vectorized.execution.enabled=true;
> SET hive.fetch.task.conversion=none;
> select vcol from testnullorc3 limit 1;
> OK
> NULL
> {code}
> {code:title=Vectorization Off}
> SET hive.vectorized.execution.enabled=false;
> SET hive.fetch.task.conversion=none;
> select vcol from testnullorc3 limit 1;
> OK
> {code}
> The input table testnullorc3 contains a varchar column vcol with few empty strings and
few nulls. For this table, non vectorized reader returns empty as first row but vectorized
reader returns NULL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message