drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5342) Refactor "managed" external sort for unit tests
Date Thu, 09 Mar 2017 17:48:38 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903474#comment-15903474

Paul Rogers commented on DRILL-5342:

Code movements include:

* Move the bulk of the external sort implementation from {{ExternalSortBatch}} (ESB) to a
new {{SortImpl}} class. The ESB becomes primarily an implementation of Drill's Volcano-inspired
"next" protocol. {{SortImpl}} does the actual sort work. {{SortImpl}} has a narrow, sub-operator
interface that does not depend on global state.

* Move memory management code to a new {{SortMemoryManager}} class to allow direct testing
of that code. This code was heavily unit tested with the two memory algorithms (per-batch
memory setup and consolidation spill decisions) undergoing significant simplification.

* The spill logic was split into two parts. Previously, the code would check a memory condition,
then directly spill as a result of that decision. Now, the logic itself resides in the {{SortMemoryManager}},
takes the required parameters, and returns a consolidation decision: spill, merge or do nothing.
The result is much easier to test. (Resulting in many code simplifications.)

* Move configuration into a new {{SortConfig}} class to allow direct unit testing.

* The prior version had taken a first crack at pulling the details of code generation into
a separate class: {{OperatorCodeGen}}. This pass breaks the prior class into "wrappers" (need
a better name) that encapsulate each algorithm for which we generate code. The wrapper holds
the code generation logic (previously in {{OperatorCodeGen}} and the code that does the work
to use the generated code (previously in various classes.)

* Some rationalization was done to the {{Sort}}, {{ExternalSort}} and {{Order}} operator definitions
to move constants from inside code to declared constants and so on to allow unit testing.

* Similar adjustments were done to the operator definition base class: {{AbstractBase}}.

* Statistics-related code moved to a new {{SortMetrics}} class to allow testing of metrics
gathering separate from the rest of the code. The metrics themselves did not change, only
how they are packaged.

> Refactor "managed" external sort for unit tests
> -----------------------------------------------
>                 Key: DRILL-5342
>                 URL: https://issues.apache.org/jira/browse/DRILL-5342
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Tools, Build & Test
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.11.0
> The external sort was heavily refactored in 1.10 to allow adding memory management. (See
DRILL-5080.) That effort focused on breaking up the large functions into smaller chunks to
allow us to more easily modify the bits relevant to adding memory management.
> This ticket discusses refining those changes to create modular bits that can be unit
tested individually. For the most part, code will remain unchanged functionally, but functions
will shift from class to class to break dependencies. Also, interfaces will be narrowed to
remove global dependencies on things like {{FragmentContext}} or {{OperatorContext}}.
> The code reviewer(s), when using GitHub, will simply see large changes. Notes below identify
what was actually changed vs. what was merely moved.

This message was sent by Atlassian JIRA

View raw message