impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe McDonnell (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Date Tue, 29 Aug 2017 00:22:33 GMT
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-5750: Catch exceptions from boost thread creation
......................................................................


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7730/3//COMMIT_MSG
Commit Message:

Line 28: variables or direct declarations in classes.
> Yeah, it does seem like there's an important distinction there.
I labeled the locations that are eligible for fault injection, and then added a parameter
to turn on thread creation fault injection. This should make testing much easier.


http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

Line 361:       COUNTER_ADD(&active_scanner_thread_counter_, -1);
> Oh I see. Seems ok. I'm a little skeptical that it's worth trying to recove
There are arguments in either direction. Given that we think this should be rare, I think
it makes sense to go ahead and abort the query. I changed this code and kudu-scan-node.cc
to do that.


http://gerrit.cloudera.org:8080/#/c/7730/4/be/src/exec/kudu-scan-node.cc
File be/src/exec/kudu-scan-node.cc:

Line 164:       UndoGetNextScanToken(token);
> In this case it does seem simpler to fail the query, since it we mess this 
Changed this to fail the query.


http://gerrit.cloudera.org:8080/#/c/7730/4/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

Line 330:       // Either get this to work or remove before merge.
> Still relevant?
Removed


Line 350:     Status instance_status = entry.second->WaitForPrepare();
> I think we should preserve the postcondition that all fragment instances ha
That makes sense. One thing that I noticed when making this change is that in some circumstances
the fragment instance will get into an RPC call before the query starts cancelling. In this
case, the RPC call needs to timeout for the query to complete its cancel. The query does eventually
end and nothing crashes.


http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/util/thread-pool.h
File be/src/util/thread-pool.h:

Line 61: 
> Yeah I think that would make most sense to me - reduces the number of diffe
Fixed this by doing Shutdown() + Join() in the error condition.


http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/util/thread.cc
File be/src/util/thread.cc:

Line 301:   unique_ptr<Thread> t(new Thread(category, name));
> Sounds like a good idea to me.
Added.


-- 
To view, visit http://gerrit.cloudera.org:8080/7730
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message