impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4134,IMPALA-3704: Kudu INSERT improvements
Date Mon, 24 Oct 2016 20:44:52 GMT
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4134,IMPALA-3704: Kudu INSERT improvements
......................................................................


Patch Set 6:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/4728/6/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

PS6, Line 145: .
> why?  (that's the more interesting part for the code reader since the code 
We want to end up with the ability to have multiple buffers where we start flushing one or
more of them and continuing to write into new ones. This is because a buffer that is flushing
must block and cannot be freed/re-used again until all pending ops in it complete.

The issue of why set 100mb total mutation space w/ a low threshold took a while for me to
get enough info, but it turns out the API is just confusing. Unfortunately there are a few
ways to specify the same behavior. Since we set the total buffer space and the flush watermark
pct, that implies a number of max buffers (100/7). We could also set the max number of buffers
and pick different values for the total buffer space / flush watermark pct to accomplish the
same thing. This is a known issue and they're looking to improve it moving forward.

I'll update the comment.


Line 282:                      << "previous write operation might be inconsistent.\n";
> can't the write operation be inconsistent even if there was no overflow? i.
Yeah I agree this is poorly worded, I think it'd just be better to take out the 2nd part.


Line 298:         error_msg_buffer << "Kudu error(s) reported, first error: " <<
e.ToString();
> why not just create the Status here since we no longer will append to it la
Done


PS6, Line 330: kudu_error_counter_->value()
> this doesn't really work in case of GetPendingErrors() returning overflow, 
I think so: If that happens we'll fail the query. If the query fails, the 'num_modified_rows'
never gets put in the profile (Coordinator::Wait()) or returned via beeswax (impala-beeswax-server
CloseInsertInternal()) -- both ignore the insert stats when query status is not ok.


http://gerrit.cloudera.org:8080/#/c/4728/6/be/src/exec/kudu-table-sink.h
File be/src/exec/kudu-table-sink.h:

PS6, Line 48: due to a 
> can't it return an error for other reasons?
Done


PS6, Line 49: be reported as warnings.
> isn't the first Kudu error also reported as the error status?
Done


PS6, Line 113: if the Kudu client may be
             :   /// getting behind.
> what does this mean? does the code use it, or is this saying a user could u
It's meant to be used as a profile counter, not for the code.
I tried to clarify.


http://gerrit.cloudera.org:8080/#/c/4728/6/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

PS6, Line 300: Error Kudu table 
> should this say "Error in Kudu table ..." or something?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5542b9a061b01c543a139e8722560b1365f06595
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <todd@apache.org>
Gerrit-HasComments: Yes

Mime
View raw message