drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6147) Limit batch size for Flat Parquet Reader
Date Sun, 04 Mar 2018 21:31:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385395#comment-16385395
] 

Paul Rogers commented on DRILL-6147:
------------------------------------

To follow up, we should look at all sides of the issue. One factor overlooked in my previous
note is that code now is better than code later.

DRILL-6147 is available today and will immediately give users a performance boost. The result
set loader is large and will take some months to commit, and so can't offer a benefit until
then.

It is hard to argue that we wait. Let's get DRILL-6147 in now, then revisit the issue later
(doing the proposed test) once the result set loader is available.

And, as discussed, DRILL-6147 works only for the flat Parquet reader. We'll need the result
set loader for the Parquet reader that reads nested types.


> Limit batch size for Flat Parquet Reader
> ----------------------------------------
>
>                 Key: DRILL-6147
>                 URL: https://issues.apache.org/jira/browse/DRILL-6147
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Major
>             Fix For: 1.14.0
>
>
> The Parquet reader currently uses a hard-coded batch size limit (32k rows) when creating
scan batches; there is no parameter nor any logic for controlling the amount of memory used.
This enhancement will allow Drill to take an extra input parameter to control direct memory
usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message