Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8284C200CED for ; Fri, 18 Aug 2017 21:18:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 80F2B16D51E; Fri, 18 Aug 2017 19:18:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C67E816D51D for ; Fri, 18 Aug 2017 21:18:00 +0200 (CEST) Received: (qmail 66685 invoked by uid 500); 18 Aug 2017 19:17:58 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 66673 invoked by uid 99); 18 Aug 2017 19:17:58 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Aug 2017 19:17:58 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 1A45EDFDDD; Fri, 18 Aug 2017 19:17:58 +0000 (UTC) From: paul-rogers To: dev@drill.apache.org Reply-To: dev@drill.apache.org References: In-Reply-To: Subject: [GitHub] drill pull request #907: DRILL-5697: Improve performance of filter operator ... Content-Type: text/plain Message-Id: <20170818191758.1A45EDFDDD@git1-us-west.apache.org> Date: Fri, 18 Aug 2017 19:17:58 +0000 (UTC) archived-at: Fri, 18 Aug 2017 19:18:01 -0000 Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/907#discussion_r134035728 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java --- @@ -85,23 +183,118 @@ public void eval() { @Output BitHolder out; @Workspace java.util.regex.Matcher matcher; @Workspace org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper charSequenceWrapper; + @Workspace org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternInfo patternInfo; @Override public void setup() { - matcher = java.util.regex.Pattern.compile(org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlToRegexLike( // + patternInfo = org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlToRegexLike( org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start, pattern.end, pattern.buffer), - org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(escape.start, escape.end, escape.buffer))).matcher(""); + org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(escape.start, escape.end, escape.buffer)); charSequenceWrapper = new org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper(); - matcher.reset(charSequenceWrapper); + + // Use java regex and compile pattern only if it is not a simple pattern. + if (patternInfo.getPatternType() == org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternType.NOT_SIMPLE) { + java.lang.String javaPatternString = patternInfo.getJavaPatternString(); + matcher = java.util.regex.Pattern.compile(javaPatternString).matcher(""); + matcher.reset(charSequenceWrapper); + } } @Override public void eval() { charSequenceWrapper.setBuffer(input.start, input.end, input.buffer); // Reusing same charSequenceWrapper, no need to pass it in. // This saves one method call since reset(CharSequence) calls reset() - matcher.reset(); - out.value = matcher.matches()? 1:0; + + // Not a simple case. Just use Java regex. + if (patternInfo.getPatternType() == org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternType.NOT_SIMPLE) { --- End diff -- We are doing a switch (actually, chain of ifs) per value. This is a tight inner loop. Far better to simply generate an instance of the proper class and call a single method to do the work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---