Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A3A48200D0C for ; Wed, 6 Sep 2017 03:00:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9B56D1609D1; Wed, 6 Sep 2017 01:00:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BABFC1609C5 for ; Wed, 6 Sep 2017 03:00:09 +0200 (CEST) Received: (qmail 92049 invoked by uid 500); 6 Sep 2017 01:00:08 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 92037 invoked by uid 99); 6 Sep 2017 01:00:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Sep 2017 01:00:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 209A1181440 for ; Wed, 6 Sep 2017 01:00:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 4ojasex4Xr3S for ; Wed, 6 Sep 2017 01:00:05 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id EA7345FACE for ; Wed, 6 Sep 2017 01:00:04 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v86101eH006085; Wed, 6 Sep 2017 01:00:01 GMT Message-Id: <201709060100.v86101eH006085@ip-10-146-233-104.ec2.internal> Date: Wed, 6 Sep 2017 01:00:00 +0000 From: "Joe McDonnell (Code Review)" To: Sailesh Mukil , Tim Armstrong , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Dan Hecht Reply-To: joemcdonnell@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5750=3A_Catch_exceptions_from_boost_thread_creation=0A?= X-Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 X-Gerrit-ChangeURL: X-Gerrit-Commit: f3f1dfa66b9b764473de9ca30648da5f267c2b2f In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Wed, 06 Sep 2017 01:00:10 -0000 Hello Sailesh Mukil, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7730 to look at the new patch set (#11). Change subject: IMPALA-5750: Catch exceptions from boost thread creation ...................................................................... IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool requires that Init() be called after the ThreadPool is constructed. 3. To propagate the Status, Threads cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Threads now use unique_ptr's for management in all cases. Threads cannot be used as stack-allocated local variables or direct declarations in classes. Query execution code paths will now handle the error: 1. If the scan node fails to spawn any scanner thread, it will abort the query. 2. Failing to spawn a fragment instance from the query state in StartFInstances() will correctly report the error to the coordinator and tear down the query. Testing: This introduces the parameter thread_creation_fault_injection, which will cause Thread::Create() calls in eligible locations to fail randomly roughly 1% of the time. Quite a few locations of Thread::Create() and ThreadPool::Init() are necessary for startup and cannot be eligible. However, all the locations used for query execution are marked as eligible and governed by this parameter. The code was tested by setting this parameter to true and running queries to verify that queries either run to completion with the correct result or fail with appropriate status. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalogd-main.cc M be/src/common/global-flags.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scan-node.h M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/thread-resource-mgr.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/generate_error_codes.py 55 files changed, 491 insertions(+), 239 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7730/11 -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong