drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6147) Limit batch size for Flat Parquet Reader
Date Sun, 04 Mar 2018 21:31:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385395#comment-16385395

Paul Rogers commented on DRILL-6147:

To follow up, we should look at all sides of the issue. One factor overlooked in my previous
note is that code now is better than code later.

DRILL-6147 is available today and will immediately give users a performance boost. The result
set loader is large and will take some months to commit, and so can't offer a benefit until

It is hard to argue that we wait. Let's get DRILL-6147 in now, then revisit the issue later
(doing the proposed test) once the result set loader is available.

And, as discussed, DRILL-6147 works only for the flat Parquet reader. We'll need the result
set loader for the Parquet reader that reads nested types.

> Limit batch size for Flat Parquet Reader
> ----------------------------------------
>                 Key: DRILL-6147
>                 URL: https://issues.apache.org/jira/browse/DRILL-6147
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Major
>             Fix For: 1.14.0
> The Parquet reader currently uses a hard-coded batch size limit (32k rows) when creating
scan batches; there is no parameter nor any logic for controlling the amount of memory used.
This enhancement will allow Drill to take an extra input parameter to control direct memory

This message was sent by Atlassian JIRA

View raw message