impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates
Date Wed, 31 May 2017 00:18:49 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

Line 314:         bufferSize = Math.min(bufferSize, Math.max(minSpillableBufferBytes_,
> Estimates are still conservative but more accurate than before - great! The
The effective MT_DOP would depend somewhat on how the scan ranges were divided between hosts
too, right? E.g. if we had 4 scan ranges across 100 hosts, and mt_dop=4, could the planner
assume that the scan ranges end up on different hosts?


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

Line 186:         || numNodes_ == 0) {
> What about numNodes_ == -1?
I'm not sure what this was about - I just kept this from the existing code. We don't actually
use num_nodes_ in the calculation so maybe it's defunct and we should instead use num_instances_.


Line 190:       perInstanceDataBytes = (long) Math.ceil(getChild(1).cardinality_
> For broadcast, it seems we should count the full HT size only once per node
This might need some more thought about how this should work with parallel plans and separate
join builds. It seems like once that's separated out more, we would compute the resource requirement
for the single instance of the join build sink, rather than divide the requirement between
the different fragment instances.


Line 193:         perInstanceDataBytes /= fragment_.getNumInstances(queryOptions.getMt_dop());
> getNumInstances() could return -1
Done


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/util/BitUtil.java
File fe/src/main/java/org/apache/impala/util/BitUtil.java:

Line 1: package org.apache.impala.util;
> Apache header
Done


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 409:     // vary the spillable buffer size..
> trailing "."
Done


Line 415:     PlanNode.minSpillableBufferBytes_ = 64 * 1024;
> Would it make sense to move this into RuntimeEnv. Setting a static variable
I didn't realise there was already a thing that existed for this purpose. Moved it there.
Also changed it so that the default buffer size was derived from --read_size rather than hardcoded.


-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message