hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zoltan Haindrich (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-24040) Slightly odd behaviour with CHAR comparisons and string literals
Date Wed, 07 Oct 2020 15:01:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-24040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209610#comment-17209610
] 

Zoltan Haindrich commented on HIVE-24040:
-----------------------------------------

{code}
select cast('a' as char(10)) = cast('a ' as varchar(50))
{code}

in psql I got some interesting results:
{code}
select length(cast('a ' as varchar(10))),length(cast('a ' as char(10) ) ),cast('a ' as varchar(10))=cast('a
' as char(10) );
 length | length | ?column? 
--------+--------+----------
      2 |      1 | t
{code}

in Hive for the above case the comparision should happen in "string" for which the lengths
are different => will not match
{code}
select length(cast(cast('a' as char(10)) as string)),length(cast(cast('a ' as varchar(50))
as string))
+------+------+
| _c0  | _c1  |
+------+------+
| 1    | 2    |
+------+------+
{code}

I feel that this is somewhere in the gray zone...will dig into the sql specs...

> Slightly odd behaviour with CHAR comparisons and string literals
> ----------------------------------------------------------------
>
>                 Key: HIVE-24040
>                 URL: https://issues.apache.org/jira/browse/HIVE-24040
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Tim Armstrong
>            Priority: Major
>
> If t is a char column, this statement behaves a bit strangely - since the RHS is a STRING,
I would have expected it to behave consistently with other CHAR/STRING comparisons, where
the CHAR column has its trailing spaces removed and the STRING does not have its trailing
spaces removed.
> {noformat}
> select count(*) from ax where t = cast('a         ' as string);
> {noformat}
> Instead it seems to be treated the same as if it was a plain literal, interpreted as
CHAR, i.e.
> {noformat}
> select count(*) from ax where t = 'a         ';
> {noformat}
> Here are some more experiments I did based on https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/in_typecheck_char.q
that seem to show some inconsistencies.
> {noformat}
> -- Hive version 3.1.3000.7.2.1.0-287 r4e72e59f1c2a51a64e0ff37b14bd396cd4e97b98
> create table ax(s char(1),t char(10));
> insert into ax values ('a','a'),('a','a '),('b','bb');
> -- varchar literal preserves trailing space
> select count(*) from ax where t = cast('a         ' as varchar(50));
> +------+
> | _c0  |
> +------+
> | 0    |
> +------+
> -- explicit cast of literal to string removes trailing space
> select count(*) from ax where t = cast('a         ' as string);
> +------+
> | _c0  |
> +------+
> | 2    |
> +------+
> -- other string expressions preserve trailing space
> select count(*) from ax where t = concat('a', '         ');
> +------+
> | _c0  |
> +------+
> | 0    |
> +------+
> -- varchar col preserves trailing space
> create table stringv as select cast('a  ' as varchar(50));
> select count(*) from ax, stringv where t = `_c0`;
> +------+
> | _c0  |
> +------+
> | 0    |
> +------+
> -- string col preserves trailing space
> create table stringa as select 'a  ';
> select count(*) from ax, stringa where t = `_c0`;
> +------+
> | _c0  |
> +------+
> | 0    |
> +------+
> {noformat}
> [~jcamachorodriguez] [~kgyrtkirk]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message