impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] MPALA-5776: Write partial tuple to the correct mempool
Date Thu, 10 Aug 2017 23:24:06 GMT
Tim Armstrong has posted comments on this change.

Change subject: MPALA-5776: Write partial tuple to the correct mempool
......................................................................


Patch Set 2:

(10 comments)

This is very subtle. I think the solution works but maybe there are some ways we can make
it clearer why it works.

http://gerrit.cloudera.org:8080/#/c/7639/2//COMMIT_MSG
Commit Message:

Line 7: MPALA-5776: Write partial tuple to the correct mempool
Missing I.


http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

Line 169:     row_batch->tuple_data_pool()->AcquireData(boundary_pool_.get(), false);
Does this even make sense? are there any cases where it's valid for the row batch to reference
data in the boundary pool directly?

It seems like either this is unnecessary or we should be transferring the contents of boundary_pool_
below instead of clearing it.


Line 299:         boundary_column_.Clear();
This is subtle because the contents of boundary_column_ are needed until WriteFields() below,
I think. Maybe it would be clearer to document this or even move it below to the point where
the data is no longer needed.


PS2, Line 765: batches
I'm not sure that I understand the reference to batches. Do they mean blocks?


Line 806:       partial_tuple_->DeepCopy(tuple_, *scan_node_->tuple_desc(), pool);
Can you add a column clarifying what the intent is. E.g. "Copy over all materialized slots
in the partial tuple".


Line 808:       boundary_row_.Reset();
It might be clearer if we factored out this reset logic for boundary_ and partial_tuple_ into
a separate function and documented the pre-conditions for it. I.e. the row has been fully
materialized and all memory has been copied into the RowBatch pool (I think that's the precondition
but not sure).


PS2, Line 810: emptied
cleared? The comment makes me thing that all the memory has been freed, but Clear() just enables
recycling of the chunks.


Line 814:       partial_tuple_ = Tuple::Create(tuple_byte_size_, boundary_pool_.get());
Is it possible to just initialize partial_tuple_ when it's needed? I.e. before we call InitTuple(template_tuple_,
partial_tuple_). Then maybe we can get rid of partial_tuple_empty_ and represent that as partition_tuple_
== nullptr.


http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.h
File be/src/exec/hdfs-text-scanner.h:

Line 188:   /// the boundary pool.
Hah! If only the code had matched the comment...


Line 194:   /// Mem pool for boundary_row_ and boundary_column_.
Needs updating since it also backs partial_tuple_.


-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message