drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-6147) Limit batch size for Flat Parquet Reader
Date Mon, 05 Mar 2018 02:19:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385489#comment-16385489
] 

Aman Sinha edited comment on DRILL-6147 at 3/5/18 2:18 AM:
-----------------------------------------------------------

{quote}The question for you and [~sachouche] is simply this. Given that we have a working
mechanism, does it make sense to invent another one? Do we want to have duplicate maintenance
costs? Have to make changes in two places? And so on?
{quote}
Certainly, if the 2 methods have similar performance in terms of reading the flat data, we
should not have a separate code path...the result set loader will be more flexible so we should
go with that.  Agree about the testing parameters that should be considered while evaluating
both readers.  The TPC-DS data in my mind is already a well established testbed for the flat
structures, it has NULL values and multiple variable-width columns, so we should use that
for the experiments. 


was (Author: amansinha100):
{quote}The question for you and [~sachouche] is simply this. Given that we have a working
mechanism, does it make sense to invent another one? Do we want to have duplicate maintenance
costs? Have to make changes in two places? And so on?
{quote}
Certainly, if the 2 methods have similar performance in terms of reading the flat data, we
should not have a separate code path...the result set loader will be more flexible so we should
go with that.  Agree about the testing parameters that should be considered while evaluating
both readers.  

> Limit batch size for Flat Parquet Reader
> ----------------------------------------
>
>                 Key: DRILL-6147
>                 URL: https://issues.apache.org/jira/browse/DRILL-6147
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Major
>             Fix For: 1.14.0
>
>
> The Parquet reader currently uses a hard-coded batch size limit (32k rows) when creating
scan batches; there is no parameter nor any logic for controlling the amount of memory used.
This enhancement will allow Drill to take an extra input parameter to control direct memory
usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message