drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5023) ExternalSortBatch does not spill fully, throws off spill calculations
Date Wed, 09 Nov 2016 02:24:58 GMT
Paul Rogers created DRILL-5023:
----------------------------------

             Summary: ExternalSortBatch does not spill fully, throws off spill calculations
                 Key: DRILL-5023
                 URL: https://issues.apache.org/jira/browse/DRILL-5023
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers
            Priority: Minor


The {{ExternalSortBatch}} (ESB) operator sorts records, spilling to disk as needed to operate
within a defined memory budget.

When needed, ESB spills accumulated record batches to disk. However, when doing so, the ESB
carves off the first spillable batch and holds it in memory:

{{code}}
    // 1 output container is kept in memory, so we want to hold on to it and transferClone
    // allows keeping ownership
    VectorContainer c1 = VectorContainer.getTransferClone(outputContainer, oContext);
    c1.buildSchema(BatchSchema.SelectionVectorMode.NONE);
    c1.setRecordCount(count);
...
    BatchGroup newGroup = new BatchGroup(c1, fs, outputFile, oContext);
}}

When the spill batch size gets larger (to fix DRILL-5022), the result is that nothing is spilled
as the first spillable batch is simply stored back into memory on the (supposedly) spilled
batches list.

The desired behavior is for all spillable batches to be written to disk. If the first batch
is held back to work around some issue (to keep a schema, say?), then fine a different solution
that allows the actual data to spill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message