impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-2502: don't redundantly repartition grouping aggregations
Date Fri, 04 Mar 2016 23:50:39 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-2502: don't redundantly repartition grouping aggregations
......................................................................


Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/2414/4/fe/src/main/java/com/cloudera/impala/planner/DataPartition.java
File fe/src/main/java/com/cloudera/impala/planner/DataPartition.java:

Line 47:   public DataPartition(TPartitionType type, List<Expr> exprs) {
let's make this private now


http://gerrit.cloudera.org:8080/#/c/2414/4/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java:

Line 743:       throws ImpalaException, InternalException {
ImpalaException should be sufficient


Line 756:       boolean childPartitionEquivalent = ctx_.getRootAnalyzer().equivSets(partitionExprs,
Would be nice to keep the var naming consistent within this file. For the join we have something
like lhsHasCompatPartition. I'm fine with changing the var names in the join of the one here.


Line 760:         // the aggregation in the child fragment.
.. in the child fragment without an extra merge step.


Line 809:       throws ImpalaException, InternalException {
ImpalaException should be sufficient


http://gerrit.cloudera.org:8080/#/c/2414/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

Line 920: from tpch_parquet.customer inner join [shuffle] tpch_parquet.orders on c_custkey
= o_custkey
add a having clause and a limit to make sure those work correctly

also add a new test where with such an aggregation inside an inline view, and the parent block
has a limit + where clause that should be assigned to the aggregation

(some of these cases are covered by more complicated test suggested below)


Line 947: select straight_join count(distinct c_custkey)
Add or modify an existing test where there are interesting expressions on the affected slots.


Line 979: select straight_join c_custkey, count(distinct c_custkey)
Add a new more complicated test that combined two joins and an aggregation in the same fragment,
plus some inline view indirection. Something like:

 select id, bigint_col, count(*) from (
  select straight_join t1.id id, t2.bigint_col as bigint_col
  inner join [shuffle] t2
    on t1.id = t2.id and t1.int_col = t2.bigint_col
  inner join t3
    on t2.id = t3.id and t2.bigint_col = t3.id
) v
group by 1, 2
having count(*) > 10
limit 10


-- 
To view, visit http://gerrit.cloudera.org:8080/2414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iffdcfd3629b8a69bd23915e1adba3b8323cbbaef
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message