drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching
Date Fri, 18 Aug 2017 19:18:01 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133516#comment-16133516
] 

ASF GitHub Bot commented on DRILL-5697:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/907#discussion_r134035728
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
---
    @@ -85,23 +183,118 @@ public void eval() {
         @Output BitHolder out;
         @Workspace java.util.regex.Matcher matcher;
         @Workspace org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper charSequenceWrapper;
    +    @Workspace org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternInfo patternInfo;
     
         @Override
         public void setup() {
    -      matcher = java.util.regex.Pattern.compile(org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlToRegexLike(
//
    +      patternInfo = org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlToRegexLike(
               org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start,
 pattern.end,  pattern.buffer),
    -          org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(escape.start,
 escape.end,  escape.buffer))).matcher("");
    +          org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(escape.start,
 escape.end,  escape.buffer));
           charSequenceWrapper = new org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper();
    -      matcher.reset(charSequenceWrapper);
    +
    +      // Use java regex and compile pattern only if it is not a simple pattern.
    +      if (patternInfo.getPatternType() == org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternType.NOT_SIMPLE)
{
    +        java.lang.String javaPatternString = patternInfo.getJavaPatternString();
    +        matcher = java.util.regex.Pattern.compile(javaPatternString).matcher("");
    +        matcher.reset(charSequenceWrapper);
    +      }
         }
     
         @Override
         public void eval() {
           charSequenceWrapper.setBuffer(input.start, input.end, input.buffer);
           // Reusing same charSequenceWrapper, no need to pass it in.
           // This saves one method call since reset(CharSequence) calls reset()
    -      matcher.reset();
    -      out.value = matcher.matches()? 1:0;
    +
    +      // Not a simple case. Just use Java regex.
    +      if (patternInfo.getPatternType() == org.apache.drill.exec.expr.fn.impl.RegexpUtil.sqlPatternType.NOT_SIMPLE)
{
    --- End diff --
    
    We are doing a switch (actually, chain of ifs) per value. This is a tight inner loop.
Far better to simply generate an instance of the proper class and call a single method to
do the work.


> Improve performance of filter operator for pattern matching
> -----------------------------------------------------------
>
>                 Key: DRILL-5697
>                 URL: https://issues.apache.org/jira/browse/DRILL-5697
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.11.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for pattern matching.
However, for cases like %abc (ends with abc), abc% (starts with abc), %abc% (contains abc),
it is observed that implementing these cases with simple code instead of using regex library
provides good performance boost (4-6x). Idea is to use special case code for simple, common
cases and fall back to Java regex library for complicated ones. That will provide good performance
benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message