drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5350) Performance: skip merge for single-batch sort
Date Mon, 13 Mar 2017 00:08:04 GMT
Paul Rogers created DRILL-5350:

             Summary: Performance: skip merge for single-batch sort
                 Key: DRILL-5350
                 URL: https://issues.apache.org/jira/browse/DRILL-5350
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
            Priority: Minor
             Fix For: 1.11.0

The external sort uses the classic two-step sort/merge process:

* Sort each incoming batch. (Optionally spill batches when needed.)
* Merge batches to create the final output.

The external sort uses two distinct merge phases: one if all batches are in memory, another
if some batches were spilled. The memory merge is obviously the fastest.

A special case occurs when the sort sees only a single batch of data. In this case, that one
batch is already sorted: there is no reason to also run the merge phase. Skipping the merge
will speed up small "operational" queries.

The effect of the optimization was measured using low-level unit tests that set up the sort
and measured just the sort run time, omitting normal query overhead. Each run consisted of
two phases. In the first phase, the test code was run five times to warm the JVM and Drill
code cache. Then, the "money' run ran another five times. Run times where then averaged.

Data consisted of 64K rows of a very simple schema: (INT, VARCHAR(5)).

Run time without the optimization: 39 ms.

Run time with the optimization: 25 ms.

The result is about a 46% improvement.

This message was sent by Atlassian JIRA

View raw message