impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe McDonnell (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4899: Fix parquet table writer dictionary leak
Date Thu, 02 Mar 2017 02:11:51 GMT
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6181

to look at the new patch set (#5).

Change subject: IMPALA-4899: Fix parquet table writer dictionary leak
......................................................................

IMPALA-4899: Fix parquet table writer dictionary leak

Currently, in HdfsTableSink, OutputPartitions are added to the RuntimeState
object pool to be freed at the end of the query. However, for clustered inserts
into a partitioned table, the OutputPartitions are only used one at a time.
They can be immediately freed once done writing to that partition.

In addition, the HdfsParquetTableWriter's ColumnWriters are also added to
this object pool. These constitute a significant amount of memory, as they
contain the dictionaries for Parquet encoding.

This change makes HdfsParquetTableWriter's ColumnWriters use unique_ptrs so
that they are cleaned up when the HdfsParquetTableWriter is deleted. It also
uses a unique_ptr on the PartitionPair for the OutputPartition.

The table writers maintain a pointer to the OutputPartition. This remains a
raw pointer. This is safe, because OutputPartition has a scoped_ptr to the
table writer. The table writer will never outlive the OutputPartition.

Change-Id: I06e354086ad24071d4fbf823f25f5df23933688f
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.h
5 files changed, 26 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/6181/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I06e354086ad24071d4fbf823f25f5df23933688f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message