Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0E55F200CB2 for ; Sun, 25 Jun 2017 18:22:59 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0CAE3160BE0; Sun, 25 Jun 2017 16:22:59 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 53031160BD8 for ; Sun, 25 Jun 2017 18:22:58 +0200 (CEST) Received: (qmail 7884 invoked by uid 500); 25 Jun 2017 16:22:57 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 7873 invoked by uid 99); 25 Jun 2017 16:22:57 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Jun 2017 16:22:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CB907C046F for ; Sun, 25 Jun 2017 16:22:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ZKAHscs44dnq for ; Sun, 25 Jun 2017 16:22:56 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id C50A55F6C0 for ; Sun, 25 Jun 2017 16:22:55 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v5PGMscj020113; Sun, 25 Jun 2017 16:22:54 GMT Message-Id: <201706251622.v5PGMscj020113@ip-10-146-233-104.ec2.internal> Date: Sun, 25 Jun 2017 16:22:54 +0000 From: "Dan Hecht (Code Review)" To: Michael Ho , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Matthew Jacobs , Juan Yu , Sailesh Mukil , Lars Volker , Mostafa Mokhtar Reply-To: dhecht@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5558/IMPALA-5576=3A_Reopen_stale_client_connection=0A?= X-Gerrit-Change-Id: I4d722c8ad3bf0e78e89887b6cb286c69ca61b8f5 X-Gerrit-ChangeURL: X-Gerrit-Commit: 4ba3444ba53ce2d1855f0c7d86b3f035986c6e3d In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Sun, 25 Jun 2017 16:22:59 -0000 Dan Hecht has posted comments on this change. Change subject: IMPALA-5558/IMPALA-5576: Reopen stale client connection ...................................................................... Patch Set 8: (7 comments) http://gerrit.cloudera.org:8080/#/c/7284/8/be/src/rpc/thrift-util.cc File be/src/rpc/thrift-util.cc: PS8, Line 198: TTransportException::END_OF_FILE && : strstr(e.what(), "No more data to read. how do we know that this means the connection was reset? is it because thrift would never get into this state otherwise? http://gerrit.cloudera.org:8080/#/c/7284/8/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: PS8, Line 350: dummy explain why this can be ignored. PS8, Line 362: connect_success i don't understand that. connect_success can be true if the previous client_status was ok yet the last rpc_status was not ok. why do we want to return true only in that case? why not just always return true? the return value is suppose to be if "cancellation was attempted", and it's only used for a log message. http://gerrit.cloudera.org:8080/#/c/7284/8/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: PS8, Line 138: message ReportExecStatus RPC messages to be more specific. http://gerrit.cloudera.org:8080/#/c/7284/8/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: Line 225: // Try to send the RPC 3 times before failing. Sleep for 100ms between retries. it'd be good to explain that retry is always safe for this rpc because the coordinator handles that explicitly. PS8, Line 236: res.status now that we try o get the connection inside the loop, this could be uninitialized, no? or does it get initialized to ok in TReportExecStatusResult constructor? Line 244: Cancel(); this Cancel() still results in IMPALA-5576, no? I'm not saying we should do anything more, though. at least we'll have tried a few times before this. -- To view, visit http://gerrit.cloudera.org:8080/7284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4d722c8ad3bf0e78e89887b6cb286c69ca61b8f5 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes