Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0909A988E for ; Fri, 17 May 2013 14:37:17 +0000 (UTC) Received: (qmail 65987 invoked by uid 500); 17 May 2013 14:37:16 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 65806 invoked by uid 500); 17 May 2013 14:37:16 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 65794 invoked by uid 500); 17 May 2013 14:37:16 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 65790 invoked by uid 99); 17 May 2013 14:37:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 May 2013 14:37:16 +0000 Date: Fri, 17 May 2013 14:37:16 +0000 (UTC) From: "Teddy Choi (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-4548) Speed up vectorized LIKE filter for special cases abc%, %abc and %abc% MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660756#comment-13660756 ] Teddy Choi commented on HIVE-4548: ---------------------------------- Review request on https://reviews.apache.org/r/11222/ > Speed up vectorized LIKE filter for special cases abc%, %abc and %abc% > ---------------------------------------------------------------------- > > Key: HIVE-4548 > URL: https://issues.apache.org/jira/browse/HIVE-4548 > Project: Hive > Issue Type: Sub-task > Affects Versions: vectorization-branch > Reporter: Eric Hanson > Assignee: Teddy Choi > Priority: Minor > Fix For: vectorization-branch > > Attachments: HIVE-4548.1-with-benchmark.patch.txt, HIVE-4548.1-without-benchmark.patch.txt > > > Speed up vectorized LIKE filter evaluation for abc%, %abc, and %abc% pattern special cases (here, abc is just a place holder for some fixed string). > > Problem: The current vectorized LIKE implementation always calls the standard LIKE function code in UDFLike.java. But this is pretty expensive. It calls multiple functions and allocates at least one new object per call. Probably 80% of uses of LIKE are for the simple patterns abc%, %abc, and %abc%. These can be implemented much more efficiently. > Start by speeding up the case for > Column LIKE "abc%" > > The goal would be to minimize expense in the inner loop. Don't use new() in the inner loop, and write a static function that checks the prefix of the string matches the like pattern as efficiently as possible, operating directly on the byte array holding UTF-8-encoded string data, and avoiding unnecessary additional function calls and if/else logic. Call that in the inner loop. > If feasible, consider using a template-driven approach, with an instance of the template expanded for each of the three cases. Start doing the abc% (prefix match) by hand, then consider templatizing for the other two cases. > The code is in the "vectorization" branch of the main hive repo. > > Start by checking in the constructor for FilterStringColLikeStringScalar.java if the pattern is one of the simple special cases. If so, record that, and have the evaluate() method call a special-case function for each case, i.e. the general case, and each of the 3 special cases. All the dynamic decision-making would be done once per vector, not once per element. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira