drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5899) Simple pattern matchers can work with DrillBuf directly
Date Fri, 03 Nov 2017 05:52:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16237125#comment-16237125

ASF GitHub Bot commented on DRILL-5899:

Github user ppadma commented on the issue:

    @sachouche @paul-rogers Thanks for the review. I updated the PR with review comments.

    I made one more change.  Previously, I was copying native memory buffer into byte array
and using it. Instead, if we go to native memory  directly, performance is significantly better.
In fact,  it is 3 times faster :-)
    Please review updated changes.

> Simple pattern matchers can work with DrillBuf directly
> -------------------------------------------------------
>                 Key: DRILL-5899
>                 URL: https://issues.apache.org/jira/browse/DRILL-5899
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Critical
> For the 4 simple patterns we have i.e. startsWith, endsWith, contains and constant,,
we do not need the overhead of charSequenceWrapper. We can work with DrillBuf directly. This
will save us from doing isAscii check and UTF8 decoding for each row.
> UTF-8 encoding ensures that no UTF-8 character is a prefix of any other valid character.
So, instead of decoding varChar from each row we are processing, encode the patternString
once during setup and do raw byte comparison. Instead of bounds checking and reading one byte
at a time, we get the whole buffer in one shot and use that for comparison.
> This improved overall performance for filter operator by around 20%. 

This message was sent by Atlassian JIRA

View raw message