drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types
Date Thu, 11 Jan 2018 20:22:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16322887#comment-16322887
] 

ASF GitHub Bot commented on DRILL-5846:
---------------------------------------

Github user sachouche commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1060#discussion_r161039122
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -874,6 +880,46 @@ public void setSafe(int index, BigDecimal value) {
           set(index, value);
         }
     
    +    /**
    +     * Copies the bulk input into this value vector and extends its capacity if necessary.
    +     * @param input bulk input
    +     */
    +    public <T extends VLBulkEntry> void setSafe(VLBulkInput<T> input) {
    +      setSafe(input, null);
    +    }
    +
    +    /**
    +     * Copies the bulk input into this value vector and extends its capacity if necessary.
The callback
    +     * mechanism allows decoration as caller is invoked for each bulk entry.
    +     *
    +     * @param input bulk input
    +     * @param callback a bulk input callback object (optional)
    +     */
    +    public <T extends VLBulkEntry> void setSafe(VLBulkInput<T> input, VLBulkInput.BulkInputCallback<T>
callback) {
    --- End diff --
    
    This code is not Parquet specific. Instead, it can be triggered by any Reader which desires
to load data in a bulk fashion. Vectors currently expose Mutator APIs for loading single values;
I see no good reason which prevent us from passing bulk values instead of a single one at
a time which prevent us from code optimization. Look at ByBuffer APIs they allow you to pass
single byte values but also byte arrays.


> Improve Parquet Reader Performance for Flat Data types 
> -------------------------------------------------------
>
>                 Key: DRILL-5846
>                 URL: https://issues.apache.org/jira/browse/DRILL-5846
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>    Affects Versions: 1.11.0
>            Reporter: salim achouche
>            Assignee: salim achouche
>              Labels: performance
>             Fix For: 1.13.0
>
>
> The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to further improve
the Parquet Reader performance as several users reported that Parquet parsing represents the
lion share of the overall query execution. It tracks Flat Data types only as Nested DTs might
involve functional and processing enhancements (e.g., a nested column can be seen as a Document;
user might want to perform operations scoped at the document level that is no need to span
all rows). Another JIRA will be created to handle the nested columns use-case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message