impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe McDonnell (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Date Thu, 12 Oct 2017 17:23:41 GMT
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 )

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
......................................................................


Patch Set 8:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/hdfs-parquet-scanner.cc@274
PS8, Line 274:   // Before closing the Column readers, the accounted bytes in mem_tracker
for
             :   // dict_decoder_ is released.
             :   if (mem_tracker_ != nullptr) {
             :     for (ParquetColumnReader* col_reader : column_readers_) col_reader->Cleanup();
             :   }
I think this code won't be necessary if col_reader->Close() does the dictionary close.


http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/hdfs-parquet-scanner.cc@348
PS8, Line 348: CloseAndUnregisterFromParent
Same question here as elsewhere: Do we really need CloseAndUnregisterFromParent() or will
Close() do?


http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/hdfs-parquet-table-writer.cc@178
PS8, Line 178: dict_encoder_base_->ClearIndices();
This should call the new base Close() function (which should do ClearIndices()).


http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/hdfs-parquet-table-writer.cc@1057
PS8, Line 1057: CloseAndUnregisterFromParent();
I think we want to limit use of CloseAndUnregisterFromParent() to cases that really need it.
I think the ordinary Close() should work here. Check if that works.


http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/parquet-column-readers.h@234
PS8, Line 234:   /// Delete the bytes used for memory tracking.
             :   virtual void Cleanup() {}
I think this can be folded into Close() and removed.


http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/parquet-column-readers.cc@269
PS8, Line 269:   virtual void Cleanup() {
             :     dict_decoder_.Close();
             :   }
I think this can be folded into Close() without a separate Cleanup() function.


http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:

http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/util/dict-encoding.h@107
PS8, Line 107:   DictEncoder(MemPool* pool, int encoded_value_size, MemTracker* mem_tracker)
:
             :       DictEncoderBase(pool), buckets_(HASH_TABLE_SIZE, Node::INVALID_INDEX),
             :       encoded_value_size_(encoded_value_size),
             :       dict_mem_tracker_(mem_tracker),
             :       dict_bytes_cnt_(0) { }
As I understand it, the DictEncoderBase/DictEncoder split is that DictEncoderBase contains
everything that does not rely on T and DictEncoder contains the type-specific stuff. The same
split applies for DictDecoderBase/DictDecoder. Since the memory tracking is not type specific,
I think it makes sense to put it in the base classes. This should simplify the code in HdfsParquetWriter,
because then you can call Close() on the DictEncoderBase rather than having to go to the typed
version. Close() on DictEncoderBase should also do ClearIndices().


http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/util/dict-encoding.h@251
PS8, Line 251: ReleaseBytes();
Is there any codepath that should legitimately use this? If not, DCHECK that dict_bytes_cnt_
is zero.



-- 
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 8
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Thu, 12 Oct 2017 17:23:41 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message