hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ke Jia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
Date Fri, 21 Jul 2017 05:21:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095796#comment-16095796
] 

Ke Jia commented on HIVE-17139:
-------------------------------

With this patch, I test "select case when a=1 then trim(b) end from test_orc_5000" in my development
machine. The data scale is almost 50 million records in table test_orc_5000(a int, b string)
stored as ORC. The execution engine is spark. I do three experiments and the average value
is as below table. The result shows the execution time of spark from 35.76s to 32.57s, the
time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then expression evaluation
from 49999735 to 5000712.

||  ||Non-optimization||Optimization||Improvement||
|Hos|35.76s|32.57s|8.9%|
|VectorSelectOperator|3.12s|0.89s|7.15%|
|count|49999735|5000712|8.99%|


						
			

			



> Conditional expressions optimization: skip the expression evaluation if the condition
is not satisfied for vectorization engine.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-17139
>                 URL: https://issues.apache.org/jira/browse/HIVE-17139
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ke Jia
>            Assignee: Ke Jia
>         Attachments: HIVE-17139.1.patch
>
>
> The case when and if statement execution for Hive vectorization is not optimal, which
all the conditional and else expressions are evaluated for current implementation. The optimized
approach is to update the selected array of batch parameter after the conditional expression
is executed. Then the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message