drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5271) EasyFormatPlugin creates readers for all input files at start - memory waste
Date Fri, 17 Feb 2017 03:28:41 GMT
Paul Rogers created DRILL-5271:
----------------------------------

             Summary: EasyFormatPlugin creates readers for all input files at start - memory
waste
                 Key: DRILL-5271
                 URL: https://issues.apache.org/jira/browse/DRILL-5271
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.10
            Reporter: Paul Rogers
            Priority: Minor


The {{EasyFormatPlugin}} creates record readers for a scan operation. The scan operation lists
the set of files to scan. The {{EasyFormatPlugin}} iterates over this list and creates a {{RecordReader}}
for each.

{code}
  public abstract RecordReader getRecordReader(FragmentContext context, DrillFileSystem dfs,
FileWork fileWork,
      List<SchemaPath> columns, String userName) throws ExecutionSetupException;
...
    for(FileWork work : scan.getWorkUnits()){
      RecordReader recordReader = getRecordReader(context, dfs, work, scan.getColumns(), scan.getUserName());
      readers.add(recordReader);
{code}

Consider a test with a single thread and 5000 files. The above behavior ends up creating 5000
{{RecordReader}} objects at query start. This holds onto resources that could be better used
elsewhere.

Suggest creating the RecordReaders as needed, discarding the old before starting the next.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message