hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jitendra Nath Pandey (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-6511) casting from decimal to tinyint,smallint, int and bigint generates different result when vectorization is on
Date Wed, 05 Mar 2014 21:26:43 GMT

     [ https://issues.apache.org/jira/browse/HIVE-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jitendra Nath Pandey updated HIVE-6511:
---------------------------------------

    Status: Patch Available  (was: Open)

> casting from decimal to tinyint,smallint, int and bigint generates different result when
vectorization is on
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-6511
>                 URL: https://issues.apache.org/jira/browse/HIVE-6511
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>         Attachments: HIVE-6511.1.patch, HIVE-6511.2.patch
>
>
> select dc,cast(dc as int), cast(dc as smallint),cast(dc as tinyint) from vectortab10korc
limit 20 generates following result when vectorization is enabled:
> {code}
> 4619756289662.078125	-1628520834	-16770	126
> 1553532646710.316406	-1245514442	-2762	54
> 3367942487288.360352	688127224	-776	-8
> 4386447830839.337891	1286221623	12087	55
> -3234165331139.458008	-54957251	27453	61
> -488378613475.326172	1247658269	-16099	29
> -493942492598.691406	-21253559	-19895	73
> 3101852523586.039062	886135874	23618	66
> 2544105595941.381836	1484956709	-23515	37
> -3997512403067.0625	1102149509	30597	-123
> -1183754978977.589355	1655994718	31070	94
> 1408783849655.676758	34576568	-26440	-72
> -2993175106993.426758	417098319	27215	79
> 3004723551798.100586	-1753555402	-8650	54
> 1103792083527.786133	-14511544	-28088	72
> 469767055288.485352	1615620024	26552	-72
> -1263700791098.294434	-980406074	12486	-58
> -4244889766496.484375	-1462078048	30112	-96
> -3962729491139.782715	1525323068	-27332	60
> NULL	NULL	NULL	NULL
> {code}
> When vectorization is disabled, result looks like this:
> {code}
> 4619756289662.078125	-1628520834	-16770	126
> 1553532646710.316406	-1245514442	-2762	54
> 3367942487288.360352	688127224	-776	-8
> 4386447830839.337891	1286221623	12087	55
> -3234165331139.458008	-54957251	27453	61
> -488378613475.326172	1247658269	-16099	29
> -493942492598.691406	-21253558	-19894	74
> 3101852523586.039062	886135874	23618	66
> 2544105595941.381836	1484956709	-23515	37
> -3997512403067.0625	1102149509	30597	-123
> -1183754978977.589355	1655994719	31071	95
> 1408783849655.676758	34576567	-26441	-73
> -2993175106993.426758	417098319	27215	79
> 3004723551798.100586	-1753555402	-8650	54
> 1103792083527.786133	-14511545	-28089	71
> 469767055288.485352	1615620024	26552	-72
> -1263700791098.294434	-980406074	12486	-58
> -4244889766496.484375	-1462078048	30112	-96
> -3962729491139.782715	1525323069	-27331	61
> NULL	NULL	NULL	NULL
> {code}
> This issue is visible only for certain decimal values. In above example, row 7,11,12,
and 15 generates different results.
> vectortab10korc table schema:
> {code}
> t                   	tinyint             	from deserializer   
> si                  	smallint            	from deserializer   
> i                   	int                 	from deserializer   
> b                   	bigint              	from deserializer   
> f                   	float               	from deserializer   
> d                   	double              	from deserializer   
> dc                  	decimal(38,18)      	from deserializer   
> bo                  	boolean             	from deserializer   
> s                   	string              	from deserializer   
> s2                  	string              	from deserializer   
> ts                  	timestamp           	from deserializer   
> 	 	 
> # Detailed Table Information	 	 
> Database:           	default             	 
> Owner:              	xyz              	 
> CreateTime:         	Tue Feb 25 21:54:28 UTC 2014	 
> LastAccessTime:     	UNKNOWN             	 
> Protect Mode:       	None                	 
> Retention:          	0                   	 
> Location:           	hdfs://host1.domain.com:8020/apps/hive/warehouse/vectortab10korc
 
> Table Type:         	MANAGED_TABLE       	 
> Table Parameters:	 	 
> 	COLUMN_STATS_ACCURATE	true                
> 	numFiles            	1                   
> 	numRows             	10000               
> 	rawDataSize         	0                   
> 	totalSize           	344748              
> 	transient_lastDdlTime	1393365281          
> 	 	 
> # Storage Information	 	 
> SerDe Library:      	org.apache.hadoop.hive.ql.io.orc.OrcSerde	 
> InputFormat:        	org.apache.hadoop.hive.ql.io.orc.OrcInputFormat	 
> OutputFormat:       	org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat	 
> Compressed:         	No                  	 
> Num Buckets:        	-1                  	 
> Bucket Columns:     	[]                  	 
> Sort Columns:       	[]                  	 
> Storage Desc Params:	 	 
> 	serialization.format	1                   
> Time taken: 0.196 seconds, Fetched: 41 row(s
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message