impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs.
Date Tue, 16 May 2017 05:06:26 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs.
......................................................................


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

Line 355:     if (!(this instanceof BaseTableRef) ||
> move || to next line
Done


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/main/java/org/apache/impala/analysis/TableSampleParams.java
File fe/src/main/java/org/apache/impala/analysis/TableSampleParams.java:

Line 24:  * Represents a TABLESAMPLE clause.
> TableSampleClause as the class name then?
Done


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 808:    * Returns a stripped-down clone of this partition with a new list of file descriptors.
> what does stripped-down mean?
Rephrased comment. Setting all fields now.


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 1940:    * The 'parcentBytes' parameter must be between 0 and 100.
> parcent
Done


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

Line 158
> comment why you're not tracking referenced partitions here
Extended class comment.


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 92:  * partition pruning and evaluation of other predicates happens after sampling.
> why should partition  pruning happen after sampling?
You are right. The sampling and the other predicates are independent, so their evaluation
order does not matter. It is simpler and more intuitive to sample after pruning.


Line 127:   private List<HdfsPartition> sample_;
> sampled_partitions_? sample_ is a bit sparse
Done


Line 264:    * sample and then prune partitions from the sample.
> why is that?
You are right. The sampling and the other predicates are independent, so their evaluation
order does not matter. It is simpler and more intuitive to sample after pruning.


Line 715:     // Adjust the cardinality based on table sampling.
> separate logically separate block w/ blank line
Done


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

Line 339:     // Not applicable to non-HDFS tables.
> what about collection refs?
Clarified. Added test for collection refs.


Line 340:     //AnalysisError("select * from functional_kudu.alltypes tablesample system (10)",
> why commented out?
My local setup (16.04) can't run Kudu. Uncommented.


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

Line 617:     ParserError("select * from t tablesample (10) a");
> add a negative percentage somewhere
Done


http://gerrit.cloudera.org:8080/#/c/6868/1/testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
File testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test:

Line 30: # that is refected in the cardinality.
> reflected
Done


http://gerrit.cloudera.org:8080/#/c/6868/1/tests/query_test/test_tablesample.py
File tests/query_test/test_tablesample.py:

Line 44:   def test_tablesample(self, vector):
> why not simply put this into a query .test file?
Added comment.

1. Without the repeatable clause the test is non-deterministic.
2. I am afraid this test will be flaky due to the non-determinism.
Any change to the number of files or the fils  during data loading can break this test (remember
the table size in planner tests?)

We could use a .test file and then expect a regex, but I think the current solution provides
better coverage.


-- 
To view, visit http://gerrit.cloudera.org:8080/6868
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ief112cfb1e4983c5d94c08696dc83da9ccf43f70
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message