drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5023) ExternalSortBatch does not spill fully, throws off spill calculations
Date Tue, 28 Mar 2017 20:15:42 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945864#comment-15945864

Paul Rogers commented on DRILL-5023:

Primarily a development issue; hard to test at the QA level.

> ExternalSortBatch does not spill fully, throws off spill calculations
> ---------------------------------------------------------------------
>                 Key: DRILL-5023
>                 URL: https://issues.apache.org/jira/browse/DRILL-5023
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.11.0
> The {{ExternalSortBatch}} (ESB) operator sorts records, spilling to disk as needed to
operate within a defined memory budget.
> When needed, ESB spills accumulated record batches to disk. However, when doing so, the
ESB carves off the first spillable batch and holds it in memory:
> {code}
>     // 1 output container is kept in memory, so we want to hold on to it and transferClone
>     // allows keeping ownership
>     VectorContainer c1 = VectorContainer.getTransferClone(outputContainer, oContext);
>     c1.buildSchema(BatchSchema.SelectionVectorMode.NONE);
>     c1.setRecordCount(count);
> ...
>     BatchGroup newGroup = new BatchGroup(c1, fs, outputFile, oContext);
> {code}
> When the spill batch size gets larger (to fix DRILL-5022), the result is that nothing
is spilled as the first spillable batch is simply stored back into memory on the (supposedly)
spilled batches list.
> The desired behavior is for all spillable batches to be written to disk. If the first
batch is held back to work around some issue (to keep a schema, say?), then fine a different
solution that allows the actual data to spill.

This message was sent by Atlassian JIRA

View raw message