hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <>
Subject [jira] [Commented] (HIVE-5436) Hive's casting behavior needs to be consistent
Date Wed, 30 Oct 2013 22:25:25 GMT


Xuefu Zhang commented on HIVE-5436:

[~hsubramaniyan] thanks for your explanation. Personally I think for HIVE-5382, current focus
might be on making iin-range casting work as expected and letting error handling cases be
unified until HIVE-5660. Per interdeciate result, Hive evaluates on an expression tree. In
general, the moment a node evaluates to null, the parent operator evaluates to null also.

Feel free to work on HIVE-5660 if you like. I'm just a little bit concerned that the work
you done will probably be thrown away because of HIVE-5356. This is why I planned to work
on it after HIVE-5356, which I'm current working on. Pleases. let me know your plan. Thank

> Hive's casting behavior needs to be consistent
> ----------------------------------------------
>                 Key: HIVE-5436
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Hari Sankar Sivarama Subramaniyan
>            Priority: Critical
> Hive's casting behavior is inconsistent and the behavior of casting from one type to
another undocumented as of now when the casted value is out of range. For example, casting
out of range values from one type to another can result in incorrect results.
> Eg: 
> 1. select cast('1000'  as tinyint) from t1;
> 2. select 1000Y from t1;
> FAILED: SemanticException [Error 10029]: Line 1:7 Invalid numerical constant '1000Y'
> 3.  select cast(1000 as tinyint) from t1;
> -24
> cast(1.1e3-1000/0 as tinyint) from t1;
> 0
> 5. select cast(10/0 as tinyint) from pw18; 
> -1
> The hive user can accidently try to typecast an out of range value. For example in the
e.g. 4/5 even though the final result is NaN, Hive can typecast to a random result. Either
we should document that the end user should take care of  overflow, underflow, division by
0, etc.  by himself/herself or we should return NULLs when the final result is out of range.

This message was sent by Atlassian JIRA

View raw message