impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Date Tue, 15 Aug 2017 17:16:22 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
......................................................................


Patch Set 2:

(6 comments)

This is already easier to understand but I'm hoping we can avoid adding yet another MemPool
to the scanners, particularly when the difference from boundary_pool_ isn't clear.

http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

Line 169:     row_batch->tuple_data_pool()->AcquireData(boundary_pool_.get(), false);
> Actually, after running some tests with ASAN, it turns out that it's not ok
It seemed like the way data in boundary_pool_ is handled is inconsistent  - the code is inconsistent
whether it should be transferred or not, particularly since it makes some effort to copy out
data from boundary_column_ already. I made some comments on the latest PS.


http://gerrit.cloudera.org:8080/#/c/7639/4/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

PS4, Line 317:  still need to handl
Comment needs update


http://gerrit.cloudera.org:8080/#/c/7639/4/be/src/exec/hdfs-text-scanner.h
File be/src/exec/hdfs-text-scanner.h:

Line 194:   /// Mem pool for boundary_row_ and boundary_column_.
I still don't understand the memory lifetime for boundary_row_, boundary_col_ and this pool.
It seems like it should be the same as partial_tuple_pool_. I.e memory is allocated from it
only when processing a row straddling blocks, and the memory can be safely freed after we've
finished materializing that row.


Line 199:   /// the line as erroneous in case of parsing errors.
Can you clarify whether this data will be referenced by the output batches.


Line 202:   /// Helper string for dealing with columns that span file blocks.
Same here. It looks like some attempt is made to copy data out of this column in CopyBoundaryField(),
maybe it's just missing a case where it isn't copied out. This case looks suspicious:

        // Missing columns or row delimiter at end of the file is ok, fill the row in.
        char* col = boundary_column_.buffer();
        int num_fields = 0;
        RETURN_IF_ERROR(delimited_text_parser_->FillColumns<true>(boundary_column_.len(),
            &col, &num_fields, field_locations_.data()));


Line 230:   /// for boundary tuples.
Can you explain the lifetime of the memory? I.e. we allocate memory from this pool


-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message