drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5899) Simple pattern matchers can work with DrillBuf directly
Date Wed, 08 Nov 2017 01:38:01 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243236#comment-16243236
] 

ASF GitHub Bot commented on DRILL-5899:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1015#discussion_r149552506
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/AbstractSqlPatternMatcher.java
---
    @@ -0,0 +1,61 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.expr.fn.impl;
    +
    +import com.google.common.base.Charsets;
    +import org.apache.drill.common.exceptions.UserException;
    +import java.nio.ByteBuffer;
    +import java.nio.CharBuffer;
    +import java.nio.charset.CharacterCodingException;
    +import java.nio.charset.CharsetEncoder;
    +import static org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.logger;
    +
    +// To get good performance for most commonly used pattern matches
    +// i.e. CONSTANT('ABC'), STARTSWITH('%ABC'), ENDSWITH('ABC%') and CONTAINS('%ABC%'),
    +// we have simple pattern matchers.
    +// Idea is to have our own implementation for simple pattern matchers so we can
    +// avoid heavy weight regex processing, skip UTF-8 decoding and char conversion.
    +// Instead, we encode the pattern string and do byte comparison against native memory.
    +// Overall, this approach
    +// gives us orders of magnitude performance improvement for simple pattern matches.
    +// Anything that is not simple is considered
    +// complex pattern and we use Java regex for complex pattern matches.
    +
    +public abstract class AbstractSqlPatternMatcher implements SqlPatternMatcher {
    +  final String patternString;
    --- End diff --
    
    `protected final`


> Simple pattern matchers can work with DrillBuf directly
> -------------------------------------------------------
>
>                 Key: DRILL-5899
>                 URL: https://issues.apache.org/jira/browse/DRILL-5899
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Critical
>
> For the 4 simple patterns we have i.e. startsWith, endsWith, contains and constant,,
we do not need the overhead of charSequenceWrapper. We can work with DrillBuf directly. This
will save us from doing isAscii check and UTF8 decoding for each row.
> UTF-8 encoding ensures that no UTF-8 character is a prefix of any other valid character.
So, instead of decoding varChar from each row we are processing, encode the patternString
once during setup and do raw byte comparison. Instead of bounds checking and reading one byte
at a time, we get the whole buffer in one shot and use that for comparison.
> This improved overall performance for filter operator by around 20%. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message