impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs.
Date Wed, 17 May 2017 14:09:51 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs.
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6868/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 1961:     List<Pair<Long, FileDescriptor>> allFiles =
> this is an expensive list, and i don't think you need it. you can achieve t
1. I'm happy to try this proposal, but a key element is missing.
How do you propose to avoid selecting the same file twice? A retry loop? The purpose of this
list is to efficiently avoid selecting the same file twice regardless of the sample percent.
I agree the object generation is probably bad. We can avoid that by using two arrays.

2. As for returning a a map instead. I'm happy to do that, but do't really see why the indirection
over ids/indexes makes sense. What do we gain from this indirection? We will generate more
bigger objects (map + sets versus a list), and we need to modify computeScanRanges() to probe
that map (or change computeScanRanges() entirely).


-- 
To view, visit http://gerrit.cloudera.org:8080/6868
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ief112cfb1e4983c5d94c08696dc83da9ccf43f70
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message