hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zoltan Haindrich (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17416) Hive Distinct changes column value
Date Mon, 11 Sep 2017 08:30:02 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160916#comment-16160916
] 

Zoltan Haindrich commented on HIVE-17416:
-----------------------------------------

I've put togerther a small repro test and run it on the active development branches:
looks like master and branch-2 is not affected; however on branch-1 the bug is present.

repro qtest:
{code}
create table t (field_name string);
insert into t values
('e_2300a?fx'),
('e_2300a'),
('x');

select
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1) r_field_name,
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1) w_field_name
from t;

select distinct
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1) r_field_name,
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1) w_field_name
from t;
{code}

> Hive Distinct changes column value
> ----------------------------------
>
>                 Key: HIVE-17416
>                 URL: https://issues.apache.org/jira/browse/HIVE-17416
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.2.1
>            Reporter: Manoj Durisheti
>
> Hive 1.2.1000.2.6.1.0-129
> Below query with distinct is expected to just dedupe the resultant data. But it alters
the data.
> *Query without Distinct:*
> select
> REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1) r_field_name,
> REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1) w_field_name
> from alpha.table_name
> where
> datestamp = 20170805
> and
> field_name = 'https://www.abcd.com/details/123-main-st-abcde-xx-84004-5434484-e_2300a'
> ;
> Result:
> e_2300a e_2300
> e_2300a e_2300
> e_2300a e_2300
> e_2300a e_2300
> e_2300a e_2300
> *Query with Distinct:*
> select distinct
> REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1) r_field_name,
> REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1) w_field_name
> from alpha.table_name
> where
> datestamp = 20170805
> and
> field_name = 'https://www.abcd.com/details/123-main-st-abcde-xx-84004-5434484-e_2300a'
> ;
> Result:
> e_2300 e_2300
> *Expected Result with Distinct is: *
> e_2300a e_2300



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message