hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ke Jia (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
Date Thu, 27 Jul 2017 03:42:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095796#comment-16095796
] 

Ke Jia edited comment on HIVE-17139 at 7/27/17 3:41 AM:
--------------------------------------------------------

With this patch, I test "select case when a=1 then trim(b) end from test_orc_5000" in my development
machine. The data scale is almost 50 million records in table test_orc_5000(a int, b string)
stored as ORC. The execution engine is spark. I do three experiments and the average value
is as below table. The result shows the execution time of spark from 35.76s to 32.57s, the
time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then expression evaluation
from 49999735 to 5000712.

||  ||Non-optimization||Optimization||Improvement||
|Hos|35.76s|32.57s|8.9%|
|VectorSelectOperator|3.12s|0.89s|71.5%|
|count|49999735|5000712|8.99%|


						
			

			




was (Author: jk_self):
With this patch, I test "select case when a=1 then trim(b) end from test_orc_5000" in my development
machine. The data scale is almost 50 million records in table test_orc_5000(a int, b string)
stored as ORC. The execution engine is spark. I do three experiments and the average value
is as below table. The result shows the execution time of spark from 35.76s to 32.57s, the
time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then expression evaluation
from 49999735 to 5000712.

||  ||Non-optimization||Optimization||Improvement||
|Hos|35.76s|32.57s|8.9%|
|VectorSelectOperator|3.12s|0.89s|7.15%|
|count|49999735|5000712|8.99%|


						
			

			



> Conditional expressions optimization: skip the expression evaluation if the condition
is not satisfied for vectorization engine.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-17139
>                 URL: https://issues.apache.org/jira/browse/HIVE-17139
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ke Jia
>            Assignee: Ke Jia
>         Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, HIVE-17139.3.patch
>
>
> The case when and if statement execution for Hive vectorization is not optimal, which
all the conditional and else expressions are evaluated for current implementation. The optimized
approach is to update the selected array of batch parameter after the conditional expression
is executed. Then the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message