Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1EF19200CC1 for ; Mon, 26 Jun 2017 01:54:21 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1D5EA160BF4; Sun, 25 Jun 2017 23:54:21 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3DCE9160BE0 for ; Mon, 26 Jun 2017 01:54:20 +0200 (CEST) Received: (qmail 47682 invoked by uid 500); 25 Jun 2017 23:54:19 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 47671 invoked by uid 99); 25 Jun 2017 23:54:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Jun 2017 23:54:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id D11E2C05B0 for ; Sun, 25 Jun 2017 23:54:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id bUxi7tu_1yfj for ; Sun, 25 Jun 2017 23:54:17 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 507015FB2C for ; Sun, 25 Jun 2017 23:54:17 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v5PNsGnA024974; Sun, 25 Jun 2017 23:54:16 GMT Message-Id: <201706252354.v5PNsGnA024974@ip-10-146-233-104.ec2.internal> Date: Sun, 25 Jun 2017 23:54:16 +0000 From: "Michael Ho (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Matthew Jacobs , Juan Yu , Sailesh Mukil , Lars Volker , Dan Hecht , Mostafa Mokhtar Reply-To: kwho@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5558/IMPALA-5576=3A_Reopen_stale_client_connection=0A?= X-Gerrit-Change-Id: I4d722c8ad3bf0e78e89887b6cb286c69ca61b8f5 X-Gerrit-ChangeURL: X-Gerrit-Commit: abf708c9e64156a1bd4e0f671f993f0be9f3252f In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Sun, 25 Jun 2017 23:54:21 -0000 Michael Ho has posted comments on this change. Change subject: IMPALA-5558/IMPALA-5576: Reopen stale client connection ...................................................................... Patch Set 9: (15 comments) http://gerrit.cloudera.org:8080/#/c/7284/9//COMMIT_MSG Commit Message: PS9, Line 18: to appear succeed > nit: typo, to appear to succeed Missed that before pushing. Will update before merging. http://gerrit.cloudera.org:8080/#/c/7284/8/be/src/rpc/thrift-util.cc File be/src/rpc/thrift-util.cc: PS8, Line 198: TTransportException::END_OF_FILE && : strstr(e.what(), "No more data to read. > Can you put this in a comment as well? Done http://gerrit.cloudera.org:8080/#/c/7284/9/be/src/rpc/thrift-util.cc File be/src/rpc/thrift-util.cc: PS9, Line 200: > nit: extra space Done http://gerrit.cloudera.org:8080/#/c/7284/9/be/src/runtime/client-cache.h File be/src/runtime/client-cache.h: PS9, Line 243: ) { > not suggesting we do this now, but why do we not handle recv cxn closed err Yes, this may be worth some re-thinking. FWIW, most callers cannot handle recv side failure any way. I believe only TrasmitData() / ReportExecStatusAux() and Cancel() will handle it. PS9, Line 301: rpc send $3 done > nit Done http://gerrit.cloudera.org:8080/#/c/7284/8/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: PS8, Line 362: true; > I was also wondering if this return value isn't very useful. What if we ins The status and the backtrace is most likely logged when it's constructed initially deep down the call stack. I will keep the return value as-is for now to avoid more unexpected complication. http://gerrit.cloudera.org:8080/#/c/7284/9/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: PS9, Line 247: duplicated > nit: duplicate Done PS9, Line 248: if (instance_exec_status.done && instance_stats->done_) continue; > Is there any reason we want to process a not done status from a fragment th Good point. We should simply ignore it. The current check is to make sure we don't subtract num_remaining_instances_ more than once for a given fragment instance. http://gerrit.cloudera.org:8080/#/c/7284/9/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: PS9, Line 138: duplicated > nit: duplicate Done PS9, Line 138: is done > can you make this more specific here? e.g. maybe "true if the final report Done http://gerrit.cloudera.org:8080/#/c/7284/9/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS9, Line 227: d > nit: duplicate Done http://gerrit.cloudera.org:8080/#/c/7284/9/be/src/testutil/fault-injection-util.h File be/src/testutil/fault-injection-util.h: PS9, Line 64: /// > not clear this is the mod Rephrased the comment. PS9, Line 66: freq > maybe this should be every_nth_rpc or such Rephrased the comment. PS9, Line 75: FAULT_INJECTION_RPC_EXCEPTION > it'd be nice to have FAULT_INJECTION_SEND_RPC_EX and FAULT_INJECTION_RECV_R Done http://gerrit.cloudera.org:8080/#/c/7284/9/tests/custom_cluster/test_rpc_exception.py File tests/custom_cluster/test_rpc_exception.py: PS9, Line 72: execute_test_query("Called read on non-open socket" > why does this result in a query failure whereas recv_timed_out does not ? Most likely, in this case, TSocket was closed (potentially due to programming error in our part). In the case of timeout, the socket is still opened but it just hits the timeout we specify when waiting for data to show up. -- To view, visit http://gerrit.cloudera.org:8080/7284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4d722c8ad3bf0e78e89887b6cb286c69ca61b8f5 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes