hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sankar Sivarama Subramaniyan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7166) Vectorization with UDFs returns incorrect results
Date Thu, 14 Aug 2014 07:05:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096682#comment-14096682
] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-7166:
---------------------------------------------------------

[~jnp] Can you please commit this jira or review HIVE-7260 which improves this fix. Vectorization
in hive with between operator is essentially broken if either of this does not get committed.

Thanks
Hari

> Vectorization with UDFs returns incorrect results
> -------------------------------------------------
>
>                 Key: HIVE-7166
>                 URL: https://issues.apache.org/jira/browse/HIVE-7166
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 0.13.0
>         Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster 
>            Reporter: Benjamin Bowman
>            Assignee: Hari Sankar Sivarama Subramaniyan
>            Priority: Minor
>         Attachments: HIVE-7166.1.patch, HIVE-7166.2.patch
>
>
> Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect query results.

> Example Query:  SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - X) and UDF_1
> The following test scenario will reproduce the problem:
> TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 10000):  
> package com.test;
> import org.apache.hadoop.hive.ql.exec.Description;
> import org.apache.hadoop.hive.ql.exec.UDF;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import java.lang.String;
> import java.lang.*;
> public class tenThousand extends UDF {
>   private final LongWritable result = new LongWritable();
>   public LongWritable evaluate() {
>     result.set(10000);
>     return result;
>   }
> }
> TEST DATA (test.input):
> 1|CBCABC|12
> 2|DBCABC|13
> 3|EBCABC|14
> 40000|ABCABC|15
> 50000|BBCABC|16
> 60000|CBCABC|17
> CREATING ORC TABLE:
> 0: jdbc:hive2://server:10002/db> create table testTabOrc (first bigint, second varchar(20),
third int) partitioned by (range int) clustered by (first) sorted by (first) into 8 buckets
stored as orc tblproperties ("orc.compress" = "SNAPPY", "orc.index" = "true");
> CREATE LOADING TABLE:
> 0: jdbc:hive2://server:10002/db> create table loadingDir (first bigint, second varchar(20),
third int) partitioned by (range int) row format delimited fields terminated by '|' stored
as textfile;
> COPY IN DATA:
> [root@server]#  hadoop fs -copyFromLocal /tmp/test.input /db/loading/.
> ORC DATA:
> [root@server]#  beeline -u jdbc:hive2://server:10002/db -n root --hiveconf hive.exec.dynamic.partition.mode=nonstrict
--hiveconf hive.enforce.sorting=true -e "insert into table testTabOrc partition(range) select
* from loadingDir;"
> LOAD TEST FUNCTION:
> 0: jdbc:hive2://server:10002/db>  add jar /opt/hadoop/lib/testFunction.jar
> 0: jdbc:hive2://server:10002/db>  create temporary function ten_thousand as 'com.test.tenThousand';
> TURN OFF VECTORIZATION:
> 0: jdbc:hive2://server:10002/db>  set hive.vectorized.execution.enabled=false;
> QUERY (RESULTS AS EXPECTED):
> 0: jdbc:hive2://server:10002/db> select first from testTabOrc where first between
ten_thousand()-10000 and ten_thousand()-9995;
> +--------+
> | first  |
> +--------+
> | 1      |
> | 2      |
> | 3      |
> +--------+
> 3 rows selected (15.286 seconds)
> TURN ON VECTORIZATION:
> 0: jdbc:hive2://server:10002/db>  set hive.vectorized.execution.enabled=true;
> QUERY AGAIN (WRONG RESULTS):
> 0: jdbc:hive2://server:10002/db> select first from testTabOrc where first between
ten_thousand()-10000 and ten_thousand()-9995;
> +--------+
> | first  |
> +--------+
> +--------+
> No rows selected (17.763 seconds)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message