hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Thusoo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-72) wrong results if partition pruning not strict and no mep-reduce job needed
Date Tue, 18 Nov 2008 23:07:44 GMT

    [ https://issues.apache.org/jira/browse/HIVE-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648794#action_12648794
] 

Ashish Thusoo commented on HIVE-72:
-----------------------------------

I think the correct way for this is to return something from the prune call to indicate that
there were some unknown partitions.

Inline Comments
ql/src/java/org/apache/hadoop/hive/ql/parse/PartitionPruner.java:278	incomplete javadocs.
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:2938	What happens in case
the parts are actually 0 i.e. no parts match the criteria (they are not unknown but they all
return false). We would in that case not be making this optimization. The test case for that
would be select * from srcpart where srcpart.ds = '2000-01-01' in our tests. We clearly do
not want to turn off the optimization when this happens. right? 

> wrong results if partition pruning not strict and no mep-reduce job needed
> --------------------------------------------------------------------------
>
>                 Key: HIVE-72
>                 URL: https://issues.apache.org/jira/browse/HIVE-72
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> Suppose T is a partitioned table on ds, where ds is a string column, the following queries:
>  SELECT a.* FROM T a WHERE a.ds=2008-09-08 LIMIT 1;
>  SELECT a.* FROM T a WHERE a.ds=2008-11-10 LIMIT 1;
> return the first row from the first partition.
> This is because of the typecast to double.
> for a.ds=2008-01-01 or anything (a.ds=1),
>  evaluate (Double, Double) is invoked at partition pruning.
> Since '2008-11-01' is not a valid double, it is converted to a null, and therefore the
result of pruning returns null (unknown) - not FALSE.
> All unknowns are also accepted, therefore all partitions are accepted which explains
this behavior.
> filter is not invoked since it is a select * query, so map-reduce job is started.
> We just turn off this optimization if pruning indicates that there can be unknown partitions.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message