drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5023) ExternalSortBatch does not spill fully, throws off spill calculations
Date Wed, 09 Nov 2016 06:22:58 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649938#comment-15649938

Paul Rogers commented on DRILL-5023:

More detail. This behavior seems to be an artifact of the way that {{BatchGroup}} was written.
It seems to require that each group has a "current container." When spilling, there really
is no need for a current container. But, because the close and and other methods assume one,
it appears that the code simply adds a container just to get things to work.

The result of this hack is that one spill batch is kept in memory per spill session. This
"overhead" is not considered when determining when to spill next, causing an unaccounted-for
accumulation of in-memory buffered rows.

The proper solution is to modify the {{BatchGroup}} class for the spill case so that it does
not require a spurious container.

> ExternalSortBatch does not spill fully, throws off spill calculations
> ---------------------------------------------------------------------
>                 Key: DRILL-5023
>                 URL: https://issues.apache.org/jira/browse/DRILL-5023
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Priority: Minor
> The {{ExternalSortBatch}} (ESB) operator sorts records, spilling to disk as needed to
operate within a defined memory budget.
> When needed, ESB spills accumulated record batches to disk. However, when doing so, the
ESB carves off the first spillable batch and holds it in memory:
> {code}
>     // 1 output container is kept in memory, so we want to hold on to it and transferClone
>     // allows keeping ownership
>     VectorContainer c1 = VectorContainer.getTransferClone(outputContainer, oContext);
>     c1.buildSchema(BatchSchema.SelectionVectorMode.NONE);
>     c1.setRecordCount(count);
> ...
>     BatchGroup newGroup = new BatchGroup(c1, fs, outputFile, oContext);
> {code}
> When the spill batch size gets larger (to fix DRILL-5022), the result is that nothing
is spilled as the first spillable batch is simply stored back into memory on the (supposedly)
spilled batches list.
> The desired behavior is for all spillable batches to be written to disk. If the first
batch is held back to work around some issue (to keep a schema, say?), then fine a different
solution that allows the actual data to spill.

This message was sent by Atlassian JIRA

View raw message