impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe McDonnell (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Date Fri, 01 Sep 2017 23:34:57 GMT
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

to look at the new patch set (#9).

Change subject: IMPALA-5750: Catch exceptions from boost thread creation

IMPALA-5750: Catch exceptions from boost thread creation

The boost thread constructor will throw boost::thread_resource_error
if it is unable to spawn a thread on the system
(e.g. due to a ulimit). This uncaught exception crashes
Impala. Systems with a large number of nodes and threads
are hitting this limit.

This change catches the exception from the thread
constructor and converts it to a Status. This requires
several changes:
1. util/thread.h's Thread constructor is now private
and all Threads are constructed via a new Create()
static factory method.
2. util/thread-pool.h's ThreadPool requires that Init()
be called after the ThreadPool is constructed.
3. To propagate the Status, Threads cannot be created in
constructors, so this is moved to initialization methods
that can return Status.
4. Threads now use unique_ptr's for management in all
cases. Threads cannot be used as stack-allocated local
variables or direct declarations in classes.

Query execution code paths will now handle the error:
1. If the scan node fails to spawn any scanner thread,
it will abort the query.
2. Failing to spawn a fragment instance from the query
state in StartFInstances() will correctly report the error
to the coordinator and tear down the query.

This introduces the parameter thread_creation_fault_injection,
which will cause Thread::Create() calls in eligible
locations to fail randomly roughly 1% of the time.
Quite a few locations of Thread::Create() and
ThreadPool::Init() are necessary for startup and cannot
be eligible. However, all the locations used for query
execution are marked as eligible and governed by this
parameter. The code was tested by setting this parameter
to true and running queries to verify that queries either
run to completion with the correct result or fail with
appropriate status.

Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0
M be/src/benchmarks/
M be/src/catalog/
M be/src/catalog/catalog-server.h
M be/src/catalog/
M be/src/common/
M be/src/common/
M be/src/common/init.h
M be/src/exec/
M be/src/exec/
M be/src/exec/hdfs-scan-node.h
M be/src/exec/
M be/src/exec/kudu-scan-node.h
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/auth-provider.h
M be/src/rpc/
M be/src/rpc/
M be/src/rpc/
M be/src/rpc/thrift-server.h
M be/src/rpc/
M be/src/rpc/thrift-thread.h
M be/src/runtime/
M be/src/runtime/disk-io-mgr-handle-cache.h
M be/src/runtime/disk-io-mgr-handle-cache.inline.h
M be/src/runtime/
M be/src/runtime/
M be/src/runtime/
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/
M be/src/runtime/
M be/src/runtime/
M be/src/runtime/query-state.h
M be/src/runtime/thread-resource-mgr.h
M be/src/scheduling/
M be/src/scheduling/admission-controller.h
M be/src/service/
M be/src/service/child-query.h
M be/src/service/
M be/src/service/client-request-state.h
M be/src/service/
M be/src/service/
M be/src/service/
M be/src/service/impala-server.h
M be/src/service/
M be/src/statestore/
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/
M be/src/statestore/statestore.h
M be/src/statestore/
M be/src/testutil/
M be/src/testutil/in-process-servers.h
M be/src/util/
M be/src/util/thread-pool.h
M be/src/util/
M be/src/util/thread.h
M common/thrift/
55 files changed, 471 insertions(+), 238 deletions(-)

  git pull ssh:// refs/changes/30/7730/9
To view, visit
To unsubscribe, visit

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Joe McDonnell <>
Gerrit-Reviewer: Sailesh Mukil <>
Gerrit-Reviewer: Tim Armstrong <>

View raw message