impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe McDonnell (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Date Thu, 31 Aug 2017 00:08:03 GMT
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-5750: Catch exceptions from boost thread creation
......................................................................


Patch Set 6:

(9 comments)

Still going through the comments, but I thought I'd put up some quick replies.

http://gerrit.cloudera.org:8080/#/c/7730/6//COMMIT_MSG
Commit Message:

Line 28: variables or direct declarations in classes.
> any reason Thread and ThreadPool don't follow the same pattern? i.e. both u
ThreadPool uses the constructor+init pattern because we subclass it for the CallableThreadPool.
Subclassing gets awkward with the static create pattern. If someone forgets to run Init(),
they find out when they offer a piece of work to the ThreadPool.

For Thread, I think it would be feasible to use the constructor+init pattern. The difference
is that if someone forgets an Init(), they might be expecting a thread and it just isn't there.


http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

PS6, Line 363: pool->ReleaseThreadToken(false, true);
> Do you need to drop the lock before calling this? Based on the comment from
Yes, that is what in_callback is for. ReleaseThreadToken can call these thread availability
callbacks, so we could end up back in this code. The in_callback=true prevents that.


http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

PS6, Line 59:   RETURN_IF_ERROR(Thread::Create("query-exec-mgr",
            :       Substitute("start-query-finstances-$0", PrintId(query_id)),
            :           &QueryExecMgr::StartQueryHelper, this, qs, &t, true));
> Shouldn't we call ReleaseQueryState(qs); if the thread failed to start here
Yes, that looks like a bug. I will fix in the next upload.


http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS6, Line 199:     // If this fragment instance is not prepared, it must be reporting an error.
             :     DCHECK(fis->IsPrepared() || !status.ok());
> Is this necessarily true?
The race might exist. I'm looking at the code to check.

The old DCHECK is stricter than the new one. Right now, we require that fis is nullptr or
fis->IsPrepared(). I'm carving out an exception for failure during thread spawn when the
fis won't have a chance to be prepared.


PS6, Line 199:     // If this fragment instance is not prepared, it must be reporting an error.
             :     DCHECK(fis->IsPrepared() || !status.ok());
> I don't really understand this DCHECK.  oh, because we pass 'fis' in the ne
In testing, I noticed that fis==nullptr doesn't actually convey the status. If I use fis==nullptr,
the coordinator never gets notified of the status and the query hangs. When I went through
ReportExecStatusAux, I noticed that apart from some DCHECKs, we don't use the status argument
unless fis is non-null. TReportExecStatusParams doesn't really have a place for status apart
from in the TFragmentInstanceExecStatus's.


PS6, Line 328: prepare_status
> why is this called "prepare_status"?  the old 'prepare_status' was the stat
The idea is to share the codepath below, which is waiting for the fragment instance states
to be prepared. To incorporate the thread create status into the instances_prepared_promise_,
the status needs to be available outside the loop.


http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

PS6, Line 598: Thread::Create("query-exec-state", "wait-thread",
             :       &ClientRequestState::Wait, this, &wait_thread_);
> Why not inject the thread failures here as well? I think it's an important 
Good point. I will add it to the list of locations that are eligible for fault injection.
I will be doing more test runs as I go, so this will now be tested.


http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/util/thread.cc
File be/src/util/thread.cc:

Line 304:     if ((nanos % 100) == 1) {
> on some platforms, the monotonic clock may not actually have nanosecond res
Good point, will make that change.


http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/util/thread.h
File be/src/util/thread.h:

PS6, Line 158: thread
> nit: 'thread'
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7730
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message