hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt McCline (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11415) Add early termination for recursion in vectorization for deep filter queries
Date Sun, 02 Aug 2015 16:39:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651085#comment-14651085
] 

Matt McCline commented on HIVE-11415:
-------------------------------------


The example could be viewed as an extension of SQL's IN clause

{code}
column_name IN (value1,value2,...)
{code}

where we extend it to support struct constants/tuples in IN:

{code}
(t, si) IN ((1,2), (2,3), (3,4), (4,5), ...)
{code}

Rather than evaluating 8,000 OR expression nodes, do a single hash table lookup.

When there are lots of OR expressions with different columns / expressions, then vectorized
OR operator could be generalized to ANY (as Gopal suggested) so it could in one evaluate look
at more than 2 conditions.  I share Gopal's concern though that the planner may make subtle
assumptions about there just being 2 arguments for OR.

Note: today vectorization does not support structs.

> Add early termination for recursion in vectorization for deep filter queries
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-11415
>                 URL: https://issues.apache.org/jira/browse/HIVE-11415
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Prasanth Jayachandran
>            Assignee: Matt McCline
>
> Queries with deep filters (left deep) throws StackOverflowException in vectorization
> {code}
> Exception in thread "main" java.lang.StackOverflowError
> 	at java.lang.Class.getAnnotation(Class.java:3415)
> 	at org.apache.hive.common.util.AnnotationUtils.getAnnotation(AnnotationUtils.java:29)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorExpressionDescriptor.getVectorExpressionClass(VectorExpressionDescriptor.java:332)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:988)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:439)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.createVectorExpression(VectorizationContext.java:1014)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:996)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164)
> {code}
> Sample query:
> {code}
> explain select count(*) from over1k where (
> (t=1 and si=2)
> or (t=2 and si=3)
> or (t=3 and si=4) 
> or (t=4 and si=5) 
> or (t=5 and si=6) 
> or (t=6 and si=7) 
> or (t=7 and si=8)
> ...
> ..
> {code}
> repeat the filter for few thousand times for reproduction of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message