drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arina Ielchiieva (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-5350) Performance: skip merge for single-batch sort
Date Tue, 07 Nov 2017 10:11:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arina Ielchiieva updated DRILL-5350:
------------------------------------
    Fix Version/s:     (was: 1.12.0)
                   1.13.0

> Performance: skip merge for single-batch sort
> ---------------------------------------------
>
>                 Key: DRILL-5350
>                 URL: https://issues.apache.org/jira/browse/DRILL-5350
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.13.0
>
>
> The external sort uses the classic two-step sort/merge process:
> * Sort each incoming batch. (Optionally spill batches when needed.)
> * Merge batches to create the final output.
> The external sort uses two distinct merge phases: one if all batches are in memory, another
if some batches were spilled. The memory merge is obviously the fastest.
> A special case occurs when the sort sees only a single batch of data. In this case, that
one batch is already sorted: there is no reason to also run the merge phase. Skipping the
merge will speed up small "operational" queries.
> The effect of the optimization was measured using low-level unit tests that set up the
sort and measured just the sort run time, omitting normal query overhead. Each run consisted
of two phases. In the first phase, the test code was run five times to warm the JVM and Drill
code cache. Then, the "money' run ran another five times. Run times where then averaged.
> Data consisted of 64K rows of a very simple schema: (INT, VARCHAR(5)).
> Run time without the optimization: 39 ms.
> Run time with the optimization: 25 ms.
> The result is about a 46% improvement.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message