hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damien Carol (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-6819) Correctness issue with Hive limit operator & predicate push down
Date Thu, 03 Apr 2014 19:27:24 GMT

     [ https://issues.apache.org/jira/browse/HIVE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Damien Carol updated HIVE-6819:
-------------------------------

    Description: 
Following query produces 0 rows with Predicate Push Down optimization turned on; the same
query produces 130 rows with predicate push down turned off.
{code:sql}
SELECT t2.c_int FROM (select key, value, c_float, c_int from t1 ORDER BY key, value, c_float,
c_int LIMIT 10) t1 JOIN t2 on t1.c_int=t2.c_int and t1.c_float=t2.c_float WHERE t2.c_int>=1;
{code}
I could reproduce this on Apache Trunk.
Haven't checked if previous releases have the same issue.

hive> desc t1;
Query ID = jpullokkaran_20140401191515_36e441c6-074b-45ae-aff6-489e13a6f401
OK
key string 
value string 
c_int int 
c_float float 
c_boolean boolean 
Time taken: 0.077 seconds, Fetched: 5 row(s)
hive> select distinct key, value, c_float, c_int from t1; 
OK
1	 1	1.0	1
1 1 1.0	1
1	1	1.0	1
1 1 1.0	1
null	null	NULL	NULL
Time taken: 0.062 seconds, Fetched: 5 row(s)
hive> desc t2;
Query ID = jpullokkaran_20140401191616_dfbd14bb-b5b8-4165-8d01-e9a61a7f1c33
OK
key string 
value string 
c_int int 
c_float float 
c_boolean boolean 
Time taken: 0.062 seconds, Fetched: 5 row(s)
hive> select distinct key, value, c_float, c_int from t2;
OK
1	 1	1.0	1
1 1 1.0	1
1	1	1.0	1
1 1 1.0	1
2	2	2.0	2
null	null	NULL	NULL
Time taken: 4.698 seconds, Fetched: 6 row(s)
hive> select t2.c_int from (select key, value, c_float, c_int from t1 order by key,value,c_float,c_int
limit 10)t1 join t2 on t1.c_int=t2.c_int and t1.c_float=t2.c_float where t2.c_int>=1;
MapredLocal task succeeded
OK
Time taken: 13.029 seconds
hive>
hive> select t2.c_int from (select key, value, c_float, c_int from t1 order by key,value,c_float,c_int
limit 10)t1 join t2 on t1.c_int=t2.c_int and t1.c_float=t2.c_float where t2.c_int>=1;
MapredLocal task succeeded
OK
...
1
1
1
1
1
1
1
1
1
1
1
Time taken: 9.317 seconds, Fetched: 130 row(s)
hive>

  was:
Following query produces 0 rows with Predicate Push Down optimization turned on; the same
query produces 130 rows with predicate push down turned off.
select t2.c_int from (select key, value, c_float, c_int from t1 order by key,value,c_float,c_int
limit 10)t1 join t2 on t1.c_int=t2.c_int and t1.c_float=t2.c_float where t2.c_int>=1;

I could reproduce this on Apache Trunk.
Haven't checked if previous releases have the same issue.

hive> desc t1;
Query ID = jpullokkaran_20140401191515_36e441c6-074b-45ae-aff6-489e13a6f401
OK
key string 
value string 
c_int int 
c_float float 
c_boolean boolean 
Time taken: 0.077 seconds, Fetched: 5 row(s)
hive> select distinct key, value, c_float, c_int from t1; 
OK
1	 1	1.0	1
1 1 1.0	1
1	1	1.0	1
1 1 1.0	1
null	null	NULL	NULL
Time taken: 0.062 seconds, Fetched: 5 row(s)
hive> desc t2;
Query ID = jpullokkaran_20140401191616_dfbd14bb-b5b8-4165-8d01-e9a61a7f1c33
OK
key string 
value string 
c_int int 
c_float float 
c_boolean boolean 
Time taken: 0.062 seconds, Fetched: 5 row(s)
hive> select distinct key, value, c_float, c_int from t2;
OK
1	 1	1.0	1
1 1 1.0	1
1	1	1.0	1
1 1 1.0	1
2	2	2.0	2
null	null	NULL	NULL
Time taken: 4.698 seconds, Fetched: 6 row(s)
hive> select t2.c_int from (select key, value, c_float, c_int from t1 order by key,value,c_float,c_int
limit 10)t1 join t2 on t1.c_int=t2.c_int and t1.c_float=t2.c_float where t2.c_int>=1;
MapredLocal task succeeded
OK
Time taken: 13.029 seconds
hive>
hive> select t2.c_int from (select key, value, c_float, c_int from t1 order by key,value,c_float,c_int
limit 10)t1 join t2 on t1.c_int=t2.c_int and t1.c_float=t2.c_float where t2.c_int>=1;
MapredLocal task succeeded
OK
...
1
1
1
1
1
1
1
1
1
1
1
Time taken: 9.317 seconds, Fetched: 130 row(s)
hive>


> Correctness issue with Hive limit operator & predicate push down
> ----------------------------------------------------------------
>
>                 Key: HIVE-6819
>                 URL: https://issues.apache.org/jira/browse/HIVE-6819
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.12.0
>            Reporter: Laljo John Pullokkaran
>            Assignee: Laljo John Pullokkaran
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6819.1.patch
>
>
> Following query produces 0 rows with Predicate Push Down optimization turned on; the
same query produces 130 rows with predicate push down turned off.
> {code:sql}
> SELECT t2.c_int FROM (select key, value, c_float, c_int from t1 ORDER BY key, value,
c_float, c_int LIMIT 10) t1 JOIN t2 on t1.c_int=t2.c_int and t1.c_float=t2.c_float WHERE t2.c_int>=1;
> {code}
> I could reproduce this on Apache Trunk.
> Haven't checked if previous releases have the same issue.
> hive> desc t1;
> Query ID = jpullokkaran_20140401191515_36e441c6-074b-45ae-aff6-489e13a6f401
> OK
> key string 
> value string 
> c_int int 
> c_float float 
> c_boolean boolean 
> Time taken: 0.077 seconds, Fetched: 5 row(s)
> hive> select distinct key, value, c_float, c_int from t1; 
> OK
> 1	 1	1.0	1
> 1 1 1.0	1
> 1	1	1.0	1
> 1 1 1.0	1
> null	null	NULL	NULL
> Time taken: 0.062 seconds, Fetched: 5 row(s)
> hive> desc t2;
> Query ID = jpullokkaran_20140401191616_dfbd14bb-b5b8-4165-8d01-e9a61a7f1c33
> OK
> key string 
> value string 
> c_int int 
> c_float float 
> c_boolean boolean 
> Time taken: 0.062 seconds, Fetched: 5 row(s)
> hive> select distinct key, value, c_float, c_int from t2;
> OK
> 1	 1	1.0	1
> 1 1 1.0	1
> 1	1	1.0	1
> 1 1 1.0	1
> 2	2	2.0	2
> null	null	NULL	NULL
> Time taken: 4.698 seconds, Fetched: 6 row(s)
> hive> select t2.c_int from (select key, value, c_float, c_int from t1 order by key,value,c_float,c_int
limit 10)t1 join t2 on t1.c_int=t2.c_int and t1.c_float=t2.c_float where t2.c_int>=1;
> MapredLocal task succeeded
> OK
> Time taken: 13.029 seconds
> hive>
> hive> select t2.c_int from (select key, value, c_float, c_int from t1 order by key,value,c_float,c_int
limit 10)t1 join t2 on t1.c_int=t2.c_int and t1.c_float=t2.c_float where t2.c_int>=1;
> MapredLocal task succeeded
> OK
> ...
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> Time taken: 9.317 seconds, Fetched: 130 row(s)
> hive>



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message