drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort
Date Sat, 25 Feb 2017 03:36:45 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883987#comment-15883987
] 

ASF GitHub Bot commented on DRILL-5284:
---------------------------------------

Github user Ben-Zvi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/761#discussion_r103069029
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
---
    @@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, RecordBatchSizer
sizer) {
         // spill batches of either 64K records, or as many records as fit into the
         // amount of memory dedicated to each spill batch, whichever is less.
     
    -    spillBatchRowCount = (int) Math.max(1, spillBatchSize / estimatedRowWidth);
    +    spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / estimatedRowWidth
/ 2);
         spillBatchRowCount = Math.min(spillBatchRowCount, Character.MAX_VALUE);
     
    +    // Compute the actual spill batch size which may be larger or smaller
    +    // than the preferred size depending on the row width. Double the estimated
    +    // memory needs to allow for power-of-two rounding.
    +
    +    targetSpillBatchSize = spillBatchRowCount * estimatedRowWidth * 2;
    +
         // Determine the number of records per batch per merge step. The goal is to
         // merge batches of either 64K records, or as many records as fit into the
         // amount of memory dedicated to each merge batch, whichever is less.
     
    -    targetMergeBatchSize = preferredMergeBatchSize;
    -    mergeBatchRowCount = (int) Math.max(1, targetMergeBatchSize / estimatedRowWidth);
    +    mergeBatchRowCount = (int) Math.max(1, preferredMergeBatchSize / estimatedRowWidth
/ 2);
         mergeBatchRowCount = Math.min(mergeBatchRowCount, Character.MAX_VALUE);
    +    targetMergeBatchSize = mergeBatchRowCount * estimatedRowWidth * 2;
     
         // Determine the minimum memory needed for spilling. Spilling is done just
         // before accepting a batch, so we must spill if we don't have room for a
         // (worst case) input batch. To spill, we need room for the output batch created
         // by merging the batches already in memory. Double this to allow for power-of-two
         // memory allocations.
     
    -    spillPoint = estimatedInputBatchSize + 2 * spillBatchSize;
    +    long spillPoint = estimatedInputBatchSize + 2 * targetSpillBatchSize;
     
         // The merge memory pool assumes we can spill all input batches. To make
         // progress, we must have at least two merge batches (same size as an output
         // batch) and one output batch. Again, double to allow for power-of-two
         // allocation and add one for a margin of error.
     
    -    int minMergeBatches = 2 * 3 + 1;
    -    long minMergeMemory = minMergeBatches * targetMergeBatchSize;
    +    long minMergeMemory = Math.round((2 * targetSpillBatchSize + targetMergeBatchSize)
* 1.05);
     
         // If we are in a low-memory condition, then we might not have room for the
         // default output batch size. In that case, pick a smaller size.
     
    -    long minMemory = Math.max(spillPoint, minMergeMemory);
    -    if (minMemory > memoryLimit) {
    -
    -      // Figure out the minimum output batch size based on memory, but can't be
    -      // any smaller than the defined minimum.
    -
    -      targetMergeBatchSize = Math.max(MIN_MERGED_BATCH_SIZE, memoryLimit / minMergeBatches);
    +    if (minMergeMemory > memoryLimit) {
     
    -      // Regardless of anything else, the batch must hold at least one
    -      // complete row.
    +      // Figure out the minimum output batch size based on memory,
    +      // must hold at least one complete row.
     
    -      targetMergeBatchSize = Math.max(estimatedRowWidth, targetMergeBatchSize);
    -      spillPoint = estimatedInputBatchSize + 2 * spillBatchSize;
    -      minMergeMemory = minMergeBatches * targetMergeBatchSize;
    +      long mergeAllowance = Math.round((memoryLimit - 2 * targetSpillBatchSize) * 0.95);
    +      targetMergeBatchSize = Math.max(estimatedRowWidth, mergeAllowance / 2);
    +      mergeBatchRowCount = (int) (targetMergeBatchSize / estimatedRowWidth / 2);
    --- End diff --
    
    If estimatedRowWidth is huge, then mergeBatchRowCount may be zero ! 


> Roll-up of final fixes for managed sort
> ---------------------------------------
>
>                 Key: DRILL-5284
>                 URL: https://issues.apache.org/jira/browse/DRILL-5284
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, extensive testing
has identified a number of minor fixes and improvements. Given the long PR cycles, it is not
practical to spend a week or two to do a PR for each fix individually. This ticket represents
a roll-up of a combination of a number of fixes. Small fixes are listed here, larger items
appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message