hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ke Jia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
Date Fri, 08 Sep 2017 01:40:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157994#comment-16157994
] 

Ke Jia commented on HIVE-17139:
-------------------------------

Upload the latest patch to fix the failed tests and the remain  failed tests seem not patch
related.
I test the patch with  table product_reviews of TPCx-BB using the following sql statement:
{code:java}
select case when pr_review_rating=4 then upper(pr_review_content)  when pr_review_rating=3
then upper(pr_review_content) end from product_reviews;
{code}
The cluster includes 8 nodes, 230G/per node. CPU is Intel(R) Xeon(R) CPU E5-2699.
With 3TB data scale and spark as executor engine, the following is the result:
|| ||without patch||with patch||improvement(s)||improvement(%)||
|Hos|28.25s|16.14s|12.11s|42.8%|
|VectorSelectOperator |2.99s|12.58s|9.59s|76.2%|
The result shows the execution time of spark from 28.25s to 16.14s and the time cost of VectorSelectOperator
from 12.58s to 2.99s.
Here, the total records, "pr_review_rating=4" records and "pr_review_rating=3" records are
as following:
|| ||count||
|total records|9934636|
|pr_review_rating=4 records|1897804|
|pr_review_rating=3 records|792278|
With this patch, only (1897804+792278) records do the upper operation of the above sql statement
and without this patch, there are (9934636+9934636) records doing the upper operation.

> Conditional expressions optimization: skip the expression evaluation if the condition
is not satisfied for vectorization engine.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-17139
>                 URL: https://issues.apache.org/jira/browse/HIVE-17139
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ke Jia
>            Assignee: Ke Jia
>         Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, HIVE-17139.3.patch, HIVE-17139.4.patch,
HIVE-17139.5.patch, HIVE-17139.6.patch, HIVE-17139.7.patch, HIVE-17139.8.patch
>
>
> The case when and if statement execution for Hive vectorization is not optimal, which
all the conditional and else expressions are evaluated for current implementation. The optimized
approach is to update the selected array of batch parameter after the conditional expression
is executed. Then the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message