Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E1E18200CBC for ; Tue, 20 Jun 2017 23:58:42 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E0235160BE1; Tue, 20 Jun 2017 21:58:42 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 31692160BCC for ; Tue, 20 Jun 2017 23:58:42 +0200 (CEST) Received: (qmail 87269 invoked by uid 500); 20 Jun 2017 21:58:41 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 87257 invoked by uid 99); 20 Jun 2017 21:58:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jun 2017 21:58:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9BA3BC03A3 for ; Tue, 20 Jun 2017 21:58:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.363 X-Spam-Level: X-Spam-Status: No, score=0.363 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id YgcBsvEbQddY for ; Tue, 20 Jun 2017 21:58:39 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id A048D5F523 for ; Tue, 20 Jun 2017 21:58:38 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v5KLwbH5001928; Tue, 20 Jun 2017 21:58:37 GMT Message-Id: <201706202158.v5KLwbH5001928@ip-10-146-233-104.ec2.internal> Date: Tue, 20 Jun 2017 21:58:36 +0000 From: "Michael Ho (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Henry Robinson , Sailesh Mukil Reply-To: kwho@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5537=3A_Retry_RPC_on_somes_exceptions_with_SSL_connection=0A?= X-Gerrit-Change-Id: I8243d4cac93c453e9396b0e24f41e147c8637b8c X-Gerrit-ChangeURL: X-Gerrit-Commit: a1b6eaa8b8ea09d09b9517a600dfca4920898c63 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Tue, 20 Jun 2017 21:58:43 -0000 Michael Ho has uploaded a new patch set (#2). Change subject: IMPALA-5537: Retry RPC on somes exceptions with SSL connection ...................................................................... IMPALA-5537: Retry RPC on somes exceptions with SSL connection After the fix for IMPALA-5388, all TSSLException thrown will be treated as fatal error and the query will fail. Turns out that this is too strict and in a secure cluster under load, queries can easily hit timeout waiting for RPC response. When running without SSL, we call RetryRpcRecv() to retry the recv part of an RPC if the TSocket underlying the RPC gets an EAGAIN during recv(). This change extends that logic to cover secure connection. In particular, we pattern match against the exception string "SSL_read: Resource temporarily unavailable" which corresponds to EAGAIN error code being thrown in the SSL_read() path. Similarly, we will handle closed connection in send() path with secure connection by pattern matching against the exception string "TTransportException: Transport not open". To verify that the exception is thrown during the send part of a RPC call, the RPC client interface has been augmented to take a bool* argument which is set to true after the send part of the RPC has completed but before the recv part starts. If DoRPC() catches an exception and the send part isn't done yet, the entire RPC if the exception string matches certain substrings which are safe to retry. The fault injection utility has also been updated to distinguish between time out and lost connection to exercise different error handling paths in the send and recv paths. Change-Id: I8243d4cac93c453e9396b0e24f41e147c8637b8c --- A be/src/catalog/catalog-service-client-wrapper.h M be/src/exec/catalog-op-executor.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-util.cc M be/src/runtime/backend-client.h M be/src/runtime/client-cache-types.h M be/src/runtime/client-cache.h M be/src/service/client-request-state.cc A be/src/statestore/statestore-service-client-wrapper.h A be/src/statestore/statestore-subscriber-client-wrapper.h M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/testutil/fault-injection-util.cc M be/src/testutil/fault-injection-util.h M tests/custom_cluster/test_rpc_exception.py 17 files changed, 375 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/7229/2 -- To view, visit http://gerrit.cloudera.org:8080/7229 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8243d4cac93c453e9396b0e24f41e147c8637b8c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil