drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types
Date Fri, 02 Feb 2018 16:55:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350643#comment-16350643

ASF GitHub Bot commented on DRILL-5846:

Github user sachouche commented on a diff in the pull request:

    --- Diff: exec/memory/base/src/main/java/io/netty/buffer/DrillBuf.java ---
    @@ -703,7 +703,18 @@ protected void _setLong(int index, long value) {
       public ByteBuf getBytes(int index, ByteBuf dst, int dstIndex, int length) {
    -    udle.getBytes(index + offset, dst, dstIndex, length);
    +    final int BULK_COPY_THR = 1024;
    --- End diff --
    - I had a chat with @bitblender and he explains that Java was invoking a stub (not a function
call) to perform copyMemory; he agreed copyMemory will be slower for small buffers and the
task was to determine the cutoff point
    - My tests (I will send you my test) indicate that a length of 1024bytes is the length
were copyMemory starts performing exactly as getByte()
    NOTE - I am using JRE 1.8; static buffers initialized once; payload 1MB (1048576bytes)
and loop-count of 102400; MacOS High Sierra; 1 thread, 4GB MX, MS

> Improve Parquet Reader Performance for Flat Data types 
> -------------------------------------------------------
>                 Key: DRILL-5846
>                 URL: https://issues.apache.org/jira/browse/DRILL-5846
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>    Affects Versions: 1.11.0
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Major
>              Labels: performance
>             Fix For: 1.13.0
> The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to further improve
the Parquet Reader performance as several users reported that Parquet parsing represents the
lion share of the overall query execution. It tracks Flat Data types only as Nested DTs might
involve functional and processing enhancements (e.g., a nested column can be seen as a Document;
user might want to perform operations scoped at the document level that is no need to span
all rows). Another JIRA will be created to handle the nested columns use-case.

This message was sent by Atlassian JIRA

View raw message