hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Shelukhin <ser...@hortonworks.com>
Subject Re: hive 2.1.0 and "NOT IN ( list )" and column is a partition_key
Date Fri, 26 Aug 2016 01:31:13 GMT
I can repro this on master. I’ll file a bug...

From: Stephen Sprague <spragues@gmail.com<mailto:spragues@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Thursday, August 25, 2016 at 13:34
To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: hive 2.1.0 and "NOT IN ( list )" and column is a partition_key

Hi Gopal,
Thank you for this insight.  good stuff.   The thing is there is no 'foo' for etl_database_source
so that filter if anything should be short-circuited to 'true'.  ie. double nots.   1. not
in  2. and foo not present.

it doesn't matter what what i put in that "not in" clause the filter always comes back false
if the column is a partition_key of course.

thanks for the tip on explain extended.... that's some crazy output so i'm sifting for clues
in that now.   i hear you though - something in there with the metastore is at play.

Cheers,
Stephen.

On Thu, Aug 25, 2016 at 1:12 PM, Gopal Vijayaraghavan <gopalv@apache.org<mailto:gopalv@apache.org>>
wrote:

> anybody run up against this one?  hive 2.1.0 + using a  "not in" on a
>list + the column is a partition key participant.

The partition filters are run before the plan is generated.

>    AND etl_source_database not in ('foo')

Is there a 'foo' in etl_source_database?

>             predicate: false (type: boolean)  #### this kills any hope
>of the query returning anything.
...
>  Select Operator        ###doesn't even mention a filter

This is probably good news, because that's an optimization.

PrunedPartitionList getPartitionsFromServer(Table tab, final
ExprNodeGenericFuncDesc compactExpr ...) {
...
          hasUnknownPartitions = Hive.get().getPartitionsByExpr(
              tab, compactExpr, conf, partitions);
}


goes into the metastore and evaluates the IN and NOT IN for partitions
ahead of time.


So, this could mean that the partition pruning evaluation returned no
partitions at all (or just exactly matched partitions only, skipping the
filter per-row).

In 2.x, you might notice it does a bit fancier things there as well, like

select count(1) from table where year*10000 + month*100 + day >= 20160101;

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hi
ve/ql/optimizer/ppr/PartitionPruner.java#L468


You can try "explain extended" and see which partitions are selected (&
validate that the filter removed was applied already).

Cheers,
Gopal





Mime
View raw message