hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Marron <>
Subject Empty Query Results
Date Tue, 06 Aug 2013 12:48:11 GMT

I'm running Hive 0.11.0 over Hadoop 1.0.4.
I have a problem with some queries returning empty results.
I have a UDF called rownumber which is deterministic.
Specifically I have two tables, championsvalues and championsrows.
They both have custom imputformats which effect filtering by examining the filter pushed down.
(Looking for predicates using the rownumber UDF.)
The table championsvalues is partitioned whilst the table championsrows is not.
I get the following behaviour:

-bash-4.1$ hive
13/08/06 13:38:47 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local
no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you
are connecting to a remote metastore.

Logging initialized using configuration in jar:file:/home/pmarron/hive-0.11.0/lib/hive-common-0.11.0.jar!/
Hive history file=/tmp/pmarron/
    > select *,rownumber() from championsvalues where aid=6 and rownumber() between 4 and
Time taken: 1.417 seconds
    > select *,rownumber() from championsrows where rownumber() between 7 and 9;
1996    Juventus        Ajax    1-1 (4-2 pens)  Rome    67000   8
1995    Ajax    AC Milan        1-0     Vienna  49730   9
1994    AC MIlan        Barcelona       4-0     Athens  70000   10
Time taken: 0.141 seconds, Fetched: 3 row(s)
    > select *,rownumber() from championsvalues where aid=6 and rownumber(away) between
4 and 7;
1982    Aston Villa     Bayern Munich   1-0     Rotterdam       46000   6       5
1998    Real Madrid     Juventus        1-0     Amsterdam       47500   6       6
1981    Liverpool       Real Madrid     1-0     Paris   48360   6       7
Time taken: 0.167 seconds, Fetched: 3 row(s)
    > quit;

In other words a query over the partitioned table returns nothing unless I pass an argument
to the UDF. I assume that this is an optimization because it evaluates the UDF once, outside
of the query and determines that no rows will be returned. That would explain why, when
I invoke the UDF with an argument it can't make this optimization and so ends up returning
What I can't understand is, if this is what's really going on, then why isn't the query over
un-partitioned table pruned in the same way?

Is this to do with query optimization or is there something else going on?
I have attached a log of the same queries being run with debugging switched on.

This is driving me mad, so any help appreciated.


View raw message