impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time.
Date Thu, 14 Dec 2017 03:05:48 GMT
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 )

Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time.
......................................................................


Patch Set 3:

(14 comments)

Thank you for working on this! I left some comments around clarifying things and on GetBytesInternal().

http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/hdfs-scanner.h@97
PS3, Line 97: /// This class also encapsulates row batch management.  Subclasses should call
CommitRows()
nit: long line (and the new Gerrit UI breaks it now, making it look bad).


http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h
File be/src/exec/scanner-context.h:

http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@143
PS3, Line 143:     /// (the scan range could be longer than the file).
Can we extend this comment with what the implication are? Is the stream still valid afterwards?
How is this different from scan_range_eosr_?


http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@146
PS3, Line 146:     /// If true, the stream has reached the end of the file.
Can we extend this comment with what the implication are? Is the stream still valid afterwards?


http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@161
PS3, Line 161:     bool ReadBoolean(bool* boolean, Status* status);
Should we add WARN_UNUSED_RESULT while you're here?


http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@193
PS3, Line 193:     /// Release resources from previous reads in this stream. If 'done' is
true all
This only releases buffers that are completely read, right? In that case, wouldn't adding
back "completed" be more clear?


http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@224
PS3, Line 224: initial
"initial" implies that there are more, no?


http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@254
PS3, Line 254:     // We always want output_buffer_bytes_left_ to be non-NULL, so we can avoid
a NULL check
nit: long line


http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@298
PS3, Line 298: 2
Huh?


http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@320
PS3, Line 320: Always
             :   /// returns the current I/O buffer to the I/O manager.
This only seems true when the buffer has been read/skipped completely.


http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@101
PS3, Line 101:     buffer_range->ReturnBuffer(move(io_buffer_));
Can we reset io_buffer_pos_ here, too? It looks to me below like we're using (io_buffer_bytes_left_
== 0, io_buffer_pos_ != nullptr) as an indicator that we're not at the end of the file. Would
it make mistakes less likely if we added that state explicitly in some variable?


http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@182
PS3, Line 182:   if (eosr()) return Status::OK();
Can you explain here (or in the header at either this function or scan_range_eosr_) why we
don't use scan_range_eosr_ here? It's not clear to me.


http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@226
PS3, Line 226: Status ScannerContext::Stream::GetBytesInternal(int64_t requested_len,
This method still looks very confusing to me. Can you think of ways to make the handling of
the various cases more obvious? For example by handling case 1 (read from io_buffer) explicitly?
Let's chat in person if you think it'll help.


http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@228
PS3, Line 228:   DCHECK_GT(requested_len, boundary_buffer_bytes_left_);
Can you also add a DCHECK to document and assert requested_len > io_buffer_bytes_left_?


http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@260
PS3, Line 260: num_bytes
This is the number of bytes we still need to copy over from the io_buffer, right? Can you
think of a better name? num_bytes_left_to_copy?



-- 
To view, visit http://gerrit.cloudera.org:8080/8814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0
Gerrit-Change-Number: 8814
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Dec 2017 03:05:48 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message