drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5697) Improve performance of filter operator for pattern matching
Date Fri, 18 Aug 2017 19:18:01 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133517#comment-16133517
] 

ASF GitHub Bot commented on DRILL-5697:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/907#discussion_r134035974
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestStringFunctions.java
---
    @@ -157,6 +157,967 @@ public void testRegexpReplace() throws Exception {
       }
     
       @Test
    +  public void testLikeStartsWith() throws Exception {
    +
    +    // all ASCII.
    +    testBuilder()
    --- End diff --
    
    The regex parsing and execution code is becoming complex. Let's test it with a true unit
test, not just a system-level test using a query. See the test frameworks available. We can
also discuss in person.


> Improve performance of filter operator for pattern matching
> -----------------------------------------------------------
>
>                 Key: DRILL-5697
>                 URL: https://issues.apache.org/jira/browse/DRILL-5697
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.11.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>
> Queries using filter with sql like operator use Java regex library for pattern matching.
However, for cases like %abc (ends with abc), abc% (starts with abc), %abc% (contains abc),
it is observed that implementing these cases with simple code instead of using regex library
provides good performance boost (4-6x). Idea is to use special case code for simple, common
cases and fall back to Java regex library for complicated ones. That will provide good performance
benefit for most common cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message