impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe McDonnell (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Date Thu, 12 Oct 2017 17:23:41 GMT
Joe McDonnell has posted comments on this change. ( )

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

Patch Set 8:

File be/src/exec/
PS8, Line 274:   // Before closing the Column readers, the accounted bytes in mem_tracker
             :   // dict_decoder_ is released.
             :   if (mem_tracker_ != nullptr) {
             :     for (ParquetColumnReader* col_reader : column_readers_) col_reader->Cleanup();
             :   }
I think this code won't be necessary if col_reader->Close() does the dictionary close.
PS8, Line 348: CloseAndUnregisterFromParent
Same question here as elsewhere: Do we really need CloseAndUnregisterFromParent() or will
Close() do?
File be/src/exec/
PS8, Line 178: dict_encoder_base_->ClearIndices();
This should call the new base Close() function (which should do ClearIndices()).
PS8, Line 1057: CloseAndUnregisterFromParent();
I think we want to limit use of CloseAndUnregisterFromParent() to cases that really need it.
I think the ordinary Close() should work here. Check if that works.
File be/src/exec/parquet-column-readers.h:
PS8, Line 234:   /// Delete the bytes used for memory tracking.
             :   virtual void Cleanup() {}
I think this can be folded into Close() and removed.
File be/src/exec/
PS8, Line 269:   virtual void Cleanup() {
             :     dict_decoder_.Close();
             :   }
I think this can be folded into Close() without a separate Cleanup() function.
File be/src/util/dict-encoding.h:
PS8, Line 107:   DictEncoder(MemPool* pool, int encoded_value_size, MemTracker* mem_tracker)
             :       DictEncoderBase(pool), buckets_(HASH_TABLE_SIZE, Node::INVALID_INDEX),
             :       encoded_value_size_(encoded_value_size),
             :       dict_mem_tracker_(mem_tracker),
             :       dict_bytes_cnt_(0) { }
As I understand it, the DictEncoderBase/DictEncoder split is that DictEncoderBase contains
everything that does not rely on T and DictEncoder contains the type-specific stuff. The same
split applies for DictDecoderBase/DictDecoder. Since the memory tracking is not type specific,
I think it makes sense to put it in the base classes. This should simplify the code in HdfsParquetWriter,
because then you can call Close() on the DictEncoderBase rather than having to go to the typed
version. Close() on DictEncoderBase should also do ClearIndices().
PS8, Line 251: ReleaseBytes();
Is there any codepath that should legitimately use this? If not, DCHECK that dict_bytes_cnt_
is zero.

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 8
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig <>
Gerrit-Reviewer: Joe McDonnell <>
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Thu, 12 Oct 2017 17:23:41 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message