impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections
Date Tue, 09 Jan 2018 19:23:40 GMT
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
......................................................................


Patch Set 6:

(6 comments)

Took a detailed look at the backend

http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.h@462
PS6, Line 462:   std::unordered_map<const TupleDescriptor*, std::unique_ptr<ScopedBuffer>>
It would make more sense at this point to use a MemPool instead of a proliferation of ScopedBuffers
in dict_filter_tuple_map_ and min_max_tuple_buffer_ - it's the canonical way to make multiple
small allocations that are freed at the same time. It would have made sense even before this
patch.

level_cache_pool_ already has the right lifetime (freed in close) so we could just rename
that to something more generic  that reflects the lifetime. E.g. we call something similar
a "perm_pool" elsewhere because its contents have the same lifetime as the owning object.


http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.h@649
PS6, Line 649:   /// Gets the TupleDescriptor of slot_desc.
Mention that 'slot_desc' can belong to the top-level tuple or a tuple nested in a collection?


http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.cc@a1620
PS6, Line 1620: 
thanks :)


http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.cc@771
PS6, Line 771:   if (!col_reader->IsCollectionReader()) {
nit: I think this would be easier to follow if we reversed the branches and removed the negation.


http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.cc@815
PS6, Line 815:   for (auto* col_reader : column_readers_) {
nit: could fit on one line now


http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.cc@1657
PS6, Line 1657:   if (column_readers.empty()) return Status::OK();
Is the early exit necessary for correctness? Might be worth mentioning if it is.

Otherwise, I don't think it matters for performance so my bias is towards leaving it out.



-- 
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 6
Gerrit-Owner: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Jan 2018 19:23:40 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message