Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CDD0C200CDA for ; Fri, 21 Jul 2017 07:21:04 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id CC2C716CA56; Fri, 21 Jul 2017 05:21:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1EF8C16CA54 for ; Fri, 21 Jul 2017 07:21:03 +0200 (CEST) Received: (qmail 57569 invoked by uid 500); 21 Jul 2017 05:21:02 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 57560 invoked by uid 99); 21 Jul 2017 05:21:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Jul 2017 05:21:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id DC2A4C02D8 for ; Fri, 21 Jul 2017 05:21:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id iGx14iQ25uD8 for ; Fri, 21 Jul 2017 05:21:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id BC2B75FC9D for ; Fri, 21 Jul 2017 05:21:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 49FF3E00C7 for ; Fri, 21 Jul 2017 05:21:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 059BA21ED1 for ; Fri, 21 Jul 2017 05:21:00 +0000 (UTC) Date: Fri, 21 Jul 2017 05:21:00 +0000 (UTC) From: "Ke Jia (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 21 Jul 2017 05:21:05 -0000 [ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095796#comment-16095796 ] Ke Jia commented on HIVE-17139: ------------------------------- With this patch, I test "select case when a=1 then trim(b) end from test_orc_5000" in my development machine. The data scale is almost 50 million records in table test_orc_5000(a int, b string) stored as ORC. The execution engine is spark. I do three experiments and the average value is as below table. The result shows the execution time of spark from 35.76s to 32.57s, the time cost of VectorSelectOperator from 3.12s to 0.89s and the count of then expression evaluation from 49999735 to 5000712. || ||Non-optimization||Optimization||Improvement|| |Hos|35.76s|32.57s|8.9%| |VectorSelectOperator|3.12s|0.89s|7.15%| |count|49999735|5000712|8.99%| > Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine. > -------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement > Reporter: Ke Jia > Assignee: Ke Jia > Attachments: HIVE-17139.1.patch > > > The case when and if statement execution for Hive vectorization is not optimal, which all the conditional and else expressions are evaluated for current implementation. The optimized approach is to update the selected array of batch parameter after the conditional expression is executed. Then the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)