hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sankar Sivarama Subramaniyan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5436) Hive's casting behavior needs to be consistent
Date Wed, 30 Oct 2013 00:32:25 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808625#comment-13808625
] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-5436:
---------------------------------------------------------

[~xuefu] Reasons why I thought of fixing the consistency first:
1. I wanted to see how the intermediate results are handled in case of numericals. For example,
for tiny int (1228+1228)/20 will lead to a in-range result, where as the intermediate result
1228+1228 will be a non tiny int.  This scenario will be very common in case of exponential
notation.
2. HIVE-5382 will need a baseline to compare the string cast results with non-string cast
results. My plan was to use testcases like this :
select cast('-1.5e2' as int)-cast(-1.5e2 as int) from tmp and verify that the result is always
0. This will ensure consistency across cast from string->numericals (and will expose any
existing bugs which is fixed in future for only one of the cast types since the non-string
cast and string cast are handled separately).

> Hive's casting behavior needs to be consistent
> ----------------------------------------------
>
>                 Key: HIVE-5436
>                 URL: https://issues.apache.org/jira/browse/HIVE-5436
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Hari Sankar Sivarama Subramaniyan
>            Priority: Critical
>
> Hive's casting behavior is inconsistent and the behavior of casting from one type to
another undocumented as of now when the casted value is out of range. For example, casting
out of range values from one type to another can result in incorrect results.
> Eg: 
> 1. select cast('1000'  as tinyint) from t1;
> NULL
> 2. select 1000Y from t1;
> FAILED: SemanticException [Error 10029]: Line 1:7 Invalid numerical constant '1000Y'
> 3.  select cast(1000 as tinyint) from t1;
> -24
> 4.select cast(1.1e3-1000/0 as tinyint) from t1;
> 0
> 5. select cast(10/0 as tinyint) from pw18; 
> -1
> The hive user can accidently try to typecast an out of range value. For example in the
e.g. 4/5 even though the final result is NaN, Hive can typecast to a random result. Either
we should document that the end user should take care of  overflow, underflow, division by
0, etc.  by himself/herself or we should return NULLs when the final result is out of range.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message