drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paul-rogers <...@git.apache.org>
Subject [GitHub] drill pull request #750: DRILL-5273: CompliantTextReader excessive memory us...
Date Thu, 23 Feb 2017 07:06:44 GMT
Github user paul-rogers commented on a diff in the pull request:

    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/CompliantTextRecordReader.java
    @@ -118,12 +118,21 @@ public boolean apply(@Nullable SchemaPath path) {
        * @param outputMutator  Used to create the schema in the output record batch
        * @throws ExecutionSetupException
    +  @SuppressWarnings("resource")
       public void setup(OperatorContext context, OutputMutator outputMutator) throws ExecutionSetupException
         oContext = context;
    -    readBuffer = context.getManagedBuffer(READ_BUFFER);
    -    whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER);
    +    // Note: DO NOT use managed buffers here. They remain in existence
    +    // until the fragment is shut down. The buffers here are large.
    --- End diff --
    The reason is a bit different. The original call allocates a managed buffer: it is freed
only when the fragment context shuts down at the end of query execution. But, if we read many
files (5000 in one test case), then we leave 5000 buffers in existence for the whole query.
    Instead, we want to take control over buffer lifetime. We allocate a regular (not managed)
buffer ourselves, and then release it when this reader closes.
    That way, instead of accumulating 5000 buffers of 1 MB each, we have only one 1 MB buffer
in existence at any one time.
    Of course, a further refinement would be to allocate the buffer on the ScanBatch and have
all 5000 readers sequentially share that same buffer. But, I was not sure that any performance
benefit was worth the cost in extra code complexity...

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message