impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time.
Date Thu, 14 Dec 2017 03:05:48 GMT
Lars Volker has posted comments on this change. ( )

Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time.

Patch Set 3:


Thank you for working on this! I left some comments around clarifying things and on GetBytesInternal().
File be/src/exec/hdfs-scanner.h:
PS3, Line 97: /// This class also encapsulates row batch management.  Subclasses should call
nit: long line (and the new Gerrit UI breaks it now, making it look bad).
File be/src/exec/scanner-context.h:
PS3, Line 143:     /// (the scan range could be longer than the file).
Can we extend this comment with what the implication are? Is the stream still valid afterwards?
How is this different from scan_range_eosr_?
PS3, Line 146:     /// If true, the stream has reached the end of the file.
Can we extend this comment with what the implication are? Is the stream still valid afterwards?
PS3, Line 161:     bool ReadBoolean(bool* boolean, Status* status);
Should we add WARN_UNUSED_RESULT while you're here?
PS3, Line 193:     /// Release resources from previous reads in this stream. If 'done' is
true all
This only releases buffers that are completely read, right? In that case, wouldn't adding
back "completed" be more clear?
PS3, Line 224: initial
"initial" implies that there are more, no?
PS3, Line 254:     // We always want output_buffer_bytes_left_ to be non-NULL, so we can avoid
a NULL check
nit: long line
PS3, Line 298: 2
PS3, Line 320: Always
             :   /// returns the current I/O buffer to the I/O manager.
This only seems true when the buffer has been read/skipped completely.
File be/src/exec/
PS3, Line 101:     buffer_range->ReturnBuffer(move(io_buffer_));
Can we reset io_buffer_pos_ here, too? It looks to me below like we're using (io_buffer_bytes_left_
== 0, io_buffer_pos_ != nullptr) as an indicator that we're not at the end of the file. Would
it make mistakes less likely if we added that state explicitly in some variable?
PS3, Line 182:   if (eosr()) return Status::OK();
Can you explain here (or in the header at either this function or scan_range_eosr_) why we
don't use scan_range_eosr_ here? It's not clear to me.
PS3, Line 226: Status ScannerContext::Stream::GetBytesInternal(int64_t requested_len,
This method still looks very confusing to me. Can you think of ways to make the handling of
the various cases more obvious? For example by handling case 1 (read from io_buffer) explicitly?
Let's chat in person if you think it'll help.
PS3, Line 228:   DCHECK_GT(requested_len, boundary_buffer_bytes_left_);
Can you also add a DCHECK to document and assert requested_len > io_buffer_bytes_left_?
PS3, Line 260: num_bytes
This is the number of bytes we still need to copy over from the io_buffer, right? Can you
think of a better name? num_bytes_left_to_copy?

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Gerrit-Change-Number: 8814
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Comment-Date: Thu, 14 Dec 2017 03:05:48 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message