[ https://issues.apache.org/jira/browse/DRILL5284?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=15883987#comment15883987
]
ASF GitHub Bot commented on DRILL5284:

Github user BenZvi commented on a diff in the pull request:
https://github.com/apache/drill/pull/761#discussion_r103069029
 Diff: exec/javaexec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java

@@ 948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, RecordBatchSizer
sizer) {
// spill batches of either 64K records, or as many records as fit into the
// amount of memory dedicated to each spill batch, whichever is less.
 spillBatchRowCount = (int) Math.max(1, spillBatchSize / estimatedRowWidth);
+ spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / estimatedRowWidth
/ 2);
spillBatchRowCount = Math.min(spillBatchRowCount, Character.MAX_VALUE);
+ // Compute the actual spill batch size which may be larger or smaller
+ // than the preferred size depending on the row width. Double the estimated
+ // memory needs to allow for poweroftwo rounding.
+
+ targetSpillBatchSize = spillBatchRowCount * estimatedRowWidth * 2;
+
// Determine the number of records per batch per merge step. The goal is to
// merge batches of either 64K records, or as many records as fit into the
// amount of memory dedicated to each merge batch, whichever is less.
 targetMergeBatchSize = preferredMergeBatchSize;
 mergeBatchRowCount = (int) Math.max(1, targetMergeBatchSize / estimatedRowWidth);
+ mergeBatchRowCount = (int) Math.max(1, preferredMergeBatchSize / estimatedRowWidth
/ 2);
mergeBatchRowCount = Math.min(mergeBatchRowCount, Character.MAX_VALUE);
+ targetMergeBatchSize = mergeBatchRowCount * estimatedRowWidth * 2;
// Determine the minimum memory needed for spilling. Spilling is done just
// before accepting a batch, so we must spill if we don't have room for a
// (worst case) input batch. To spill, we need room for the output batch created
// by merging the batches already in memory. Double this to allow for poweroftwo
// memory allocations.
 spillPoint = estimatedInputBatchSize + 2 * spillBatchSize;
+ long spillPoint = estimatedInputBatchSize + 2 * targetSpillBatchSize;
// The merge memory pool assumes we can spill all input batches. To make
// progress, we must have at least two merge batches (same size as an output
// batch) and one output batch. Again, double to allow for poweroftwo
// allocation and add one for a margin of error.
 int minMergeBatches = 2 * 3 + 1;
 long minMergeMemory = minMergeBatches * targetMergeBatchSize;
+ long minMergeMemory = Math.round((2 * targetSpillBatchSize + targetMergeBatchSize)
* 1.05);
// If we are in a lowmemory condition, then we might not have room for the
// default output batch size. In that case, pick a smaller size.
 long minMemory = Math.max(spillPoint, minMergeMemory);
 if (minMemory > memoryLimit) {

 // Figure out the minimum output batch size based on memory, but can't be
 // any smaller than the defined minimum.

 targetMergeBatchSize = Math.max(MIN_MERGED_BATCH_SIZE, memoryLimit / minMergeBatches);
+ if (minMergeMemory > memoryLimit) {
 // Regardless of anything else, the batch must hold at least one
 // complete row.
+ // Figure out the minimum output batch size based on memory,
+ // must hold at least one complete row.
 targetMergeBatchSize = Math.max(estimatedRowWidth, targetMergeBatchSize);
 spillPoint = estimatedInputBatchSize + 2 * spillBatchSize;
 minMergeMemory = minMergeBatches * targetMergeBatchSize;
+ long mergeAllowance = Math.round((memoryLimit  2 * targetSpillBatchSize) * 0.95);
+ targetMergeBatchSize = Math.max(estimatedRowWidth, mergeAllowance / 2);
+ mergeBatchRowCount = (int) (targetMergeBatchSize / estimatedRowWidth / 2);
 End diff 
If estimatedRowWidth is huge, then mergeBatchRowCount may be zero !
> Rollup of final fixes for managed sort
> 
>
> Key: DRILL5284
> URL: https://issues.apache.org/jira/browse/DRILL5284
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL5080. Since that time, extensive testing
has identified a number of minor fixes and improvements. Given the long PR cycles, it is not
practical to spend a week or two to do a PR for each fix individually. This ticket represents
a rollup of a combination of a number of fixes. Small fixes are listed here, larger items
appear as subtasks.

This message was sent by Atlassian JIRA
(v6.3.15#6346)
