impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5498: Support for partial sorts
Date Mon, 10 Jul 2017 22:35:41 GMT
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5498: Support for partial sorts
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/7267/3/be/src/exec/partial-sort-node.cc
File be/src/exec/partial-sort-node.cc:

Line 32:     sorter_(NULL),
> This doesn't need to handle offset?
Done


Line 127: 
> Since this only requires a partial sort, you can exit early when the cumula
As noted elsewhere, removing limit-related code.


Line 140:     *eos = input_eos_;
> I guess it is not clear to me why allowing a limit without an offset is use
Right, so 'offset' never really makes sense with partial-sort-node, since as you say there's
not a total ordering and you're not guaranteed that one instance of partial-sort-node with
an offset can pick up where another instance left off with a limit.

A limit is potentially useful just for cutting off the number of rows returned in a single
partial-sort-node, but with the initial use case for this there's no way to end up with a
partial-sort-node with a limit, so in the interest of keeping things simple and not having
untested code paths, I think I'll just remove the limit code.


http://gerrit.cloudera.org:8080/#/c/7267/3/be/src/exec/partial-sort-node.h
File be/src/exec/partial-sort-node.h:

Line 58:   virtual Status QueryMaintenance(RuntimeState* state);
> Just for general niceness, can you also update be/src/exec/sort-node.h's pr
Done


http://gerrit.cloudera.org:8080/#/c/7267/3/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 1384:       expr_mem_pool, &sort_tuple_expr_evals_));
> Need to update this calculation for partial sorts.
Done


Line 1434:   if (sorted_runs_.size() == 1) {
> Unnecessary - this is unconditionally dereferenced
Done


http://gerrit.cloudera.org:8080/#/c/7267/3/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

Line 96:   Sorter(const TupleRowComparator& compare_less_than,
> Wasn't the plan to apply the partition sort memory limit to the sorter itse
This will be included in a follow up patch.


http://gerrit.cloudera.org:8080/#/c/7267/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

PS3, Line 349: e {
> Should this be "first" since the order is specified elsewhere.
My problem with that wording is that it could be interpreted as "take the first N elements
of input and then sort them"

Maybe: "Return the first N sorted elements"?


Line 360: struct TSortNode {
> Please ignore my earlier comments about offset handling (although a comment
Added to partial-sort-node.h


http://gerrit.cloudera.org:8080/#/c/7267/3/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

Line 58:   private final long PARTIAL_SORT_MEM_LIMIT = 128 * 1024 * 1024;
> It might be clearer to specify this in bytes, then just make sure that its 
Done


Line 286:       // The memory limit cannot be less than the size of the required blocks.
> This looks good to me.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zamsden@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message