hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: hive 2.1.0 and "NOT IN ( list )" and column is a partition_key
Date Thu, 25 Aug 2016 20:12:30 GMT

> anybody run up against this one?  hive 2.1.0 + using a  "not in" on a
>list + the column is a partition key participant.

The partition filters are run before the plan is generated.

>    AND etl_source_database not in ('foo')

Is there a 'foo' in etl_source_database?

>             predicate: false (type: boolean)  #### this kills any hope
>of the query returning anything.
...
>  Select Operator        ###doesn't even mention a filter

This is probably good news, because that's an optimization.

PrunedPartitionList getPartitionsFromServer(Table tab, final
ExprNodeGenericFuncDesc compactExpr ...) {
...
          hasUnknownPartitions = Hive.get().getPartitionsByExpr(
              tab, compactExpr, conf, partitions);
}


goes into the metastore and evaluates the IN and NOT IN for partitions
ahead of time.


So, this could mean that the partition pruning evaluation returned no
partitions at all (or just exactly matched partitions only, skipping the
filter per-row).

In 2.x, you might notice it does a bit fancier things there as well, like

select count(1) from table where year*10000 + month*100 + day >= 20160101;

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hi
ve/ql/optimizer/ppr/PartitionPruner.java#L468


You can try "explain extended" and see which partitions are selected (&
validate that the filter removed was applied already).

Cheers,
Gopal





Mime
View raw message