drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching
Date Fri, 18 Aug 2017 19:18:01 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133508#comment-16133508
] 

ASF GitHub Bot commented on DRILL-5697:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/907#discussion_r134032343
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/RegexpUtil.java
---
    @@ -47,18 +47,55 @@
           "[:alnum:]", "\\p{Alnum}"
       };
     
    +  // type of pattern string.
    +  public enum sqlPatternType {
    --- End diff --
    
    Class name format: `SqlPatternType`


> Improve performance of filter operator for pattern matching
> -----------------------------------------------------------
>
>                 Key: DRILL-5697
>                 URL: https://issues.apache.org/jira/browse/DRILL-5697
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.11.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for pattern matching.
However, for cases like %abc (ends with abc), abc% (starts with abc), %abc% (contains abc),
it is observed that implementing these cases with simple code instead of using regex library
provides good performance boost (4-6x). Idea is to use special case code for simple, common
cases and fall back to Java regex library for complicated ones. That will provide good performance
benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message