impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during analysis.
Date Fri, 22 Jul 2016 00:54:46 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-3084: Cache the sequence of table ref and materialized tuple ids during
analysis.
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/2415/1/fe/src/main/java/com/cloudera/impala/analysis/TableRef.java
File fe/src/main/java/com/cloudera/impala/analysis/TableRef.java:

Line 111:   // Lists of table ref ids and materialized tuple ids of the full sequence of table
> we talked about separating the tblref abstraction into a) a logical concept
The purpose of caching the table refs ids here is to determine which tables originally appeared
to the left of a certain table during plan generation.

I'm thinking that with the table expr concept this caching will become unnecessary because
the information would be captured in the table expr tree. I think the plan generation from
table exprs would basically make this bug impossible, but maybe I'm missing something.


Line 365:   public List<TupleId> getAllTupleIds() {
> if you want to make the distinction between tuple and tblref ids you should
Done


Line 444:    * table ref ids and materialized tuple ids.
> reference member names directly.
Done


Line 449:     Preconditions.checkState(desc_ != null);
> this depends on analyzeJoin() having been called for leftTblRef; checkState
Done


Line 459:     if (leftTblRef_ != null) {
> interleave so you only need one if-stmt.
Done


http://gerrit.cloudera.org:8080/#/c/2415/1/fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java
File fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java:

Line 1389:         List<SlotId> lhsSlotIds = analyzer.getEquivSlots(slotDesc.getId(),
lhsTblRefIds);
> (rhsSid, ...
Done


Line 1393:           // eliminate rows that would have been non-matches of an outer or anti
join.
> this seems to make assumptions about how the returned predicates are being 
The comment is explaining why a uni-directional value transfer is not sufficient. Note that
an equivalence classes contains slots that have 'any' value transfer, not necessarily a mutual
one, so we need to check if here explicitly.

Added a comment to lhsSlotIds.


http://gerrit.cloudera.org:8080/#/c/2415/1/testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
File testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test:

Line 1540:    partitions=1/1 files=4 size=554.13MB
> do you know what this is?
I reverted these changes. We ignore the file size anyway when doing the actual/expected comparison.
We sometimes see slightly different files after upgrading Hive, etc.


-- 
To view, visit http://gerrit.cloudera.org:8080/2415
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message