[ https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883995#comment-15883995
]
ASF GitHub Bot commented on DRILL-5284:
---------------------------------------
Github user Ben-Zvi commented on a diff in the pull request:
https://github.com/apache/drill/pull/761#discussion_r103067805
--- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
---
@@ -392,22 +448,31 @@ private void configure(DrillConfig config) {
// Set too large and the ratio between memory and input data sizes becomes
// small. Set too small and disk seek times dominate performance.
- spillBatchSize = config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE);
- spillBatchSize = Math.max(spillBatchSize, MIN_SPILL_BATCH_SIZE);
+ preferredSpillBatchSize = config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE);
+
+ // In low memory, use no more than 1/4 of memory for each spill batch. Ensures we
+ // can merge.
+
+ preferredSpillBatchSize = Math.min(preferredSpillBatchSize, memoryLimit / 4);
--- End diff --
Why restrict the spill batch size so low ? This would create more runs and increase the
risk of needing those intermediate merges. Otherwise during a merge, only a single batch
at a time is read from each run, not the whole run (I believe -- if we spill all the remaining
batches at the end ...)
> Roll-up of final fixes for managed sort
> ---------------------------------------
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, extensive testing
has identified a number of minor fixes and improvements. Given the long PR cycles, it is not
practical to spend a week or two to do a PR for each fix individually. This ticket represents
a roll-up of a combination of a number of fixes. Small fixes are listed here, larger items
appear as sub-tasks.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
|