Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AB412200CDA for ; Fri, 4 Aug 2017 20:08:40 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A46FC16E03D; Fri, 4 Aug 2017 18:08:40 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EB02F16E03B for ; Fri, 4 Aug 2017 20:08:39 +0200 (CEST) Received: (qmail 87208 invoked by uid 500); 4 Aug 2017 18:08:39 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 87174 invoked by uid 99); 4 Aug 2017 18:08:37 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Aug 2017 18:08:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E1B481A0B71 for ; Fri, 4 Aug 2017 18:08:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id xufnTbkJ6aLY for ; Fri, 4 Aug 2017 18:08:36 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id CC81F5FD1B for ; Fri, 4 Aug 2017 18:08:35 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v74I8Y8u002937; Fri, 4 Aug 2017 18:08:34 GMT Message-Id: <201708041808.v74I8Y8u002937@ip-10-146-233-104.ec2.internal> Date: Fri, 4 Aug 2017 18:08:34 +0000 From: "Thomas Tauber-Marshall (Code Review)" To: Michael Ho , Sailesh Mukil , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Henry Robinson Reply-To: tmarshall@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5749=3A_coordinator_race_hits_DCHECK_=27num_remaining_backends__=3E_0=27=0A?= X-Gerrit-Change-Id: I1528661e5df6d9732ebfeb414576c82ec5c92241 X-Gerrit-ChangeURL: X-Gerrit-Commit: 4210c12f0bc93d454481fc574723c4f695f02318 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Fri, 04 Aug 2017 18:08:40 -0000 Hello Michael Ho, Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7577 to look at the new patch set (#2). Change subject: IMPALA-5749: coordinator race hits DCHECK 'num_remaining_backends_ > 0' ...................................................................... IMPALA-5749: coordinator race hits DCHECK 'num_remaining_backends_ > 0' In Coordinator::UpdateBackendExecStatus(), we check if the backend has already completed with BackendState::IsDone() and return without applying the update if so to avoid updating num_remaining_backends_ twice for the same completed backend. The problem is that the value of BackendState::IsDone() is updated by the call to BackendState::ApplyExecStatusReport() that comes after it, but these operations are not performed atomically, so if there are two simultaneous calls to UpdateBackendExecStatus(), they can both call IsDone(), both get 'false', and then proceed to erroneously both update num_remaining_backends_, hitting a DCHECK. The solution is to perform both the call to IsDone() and the update to it atomically by holding the BackendState::lock_. Testing: - Ran test_finst_cancel_when_query_complete 10,000 times without hitting the DCHECK (previously, it would hit about once per 300 runs). Change-Id: I1528661e5df6d9732ebfeb414576c82ec5c92241 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc 3 files changed, 18 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/7577/2 -- To view, visit http://gerrit.cloudera.org:8080/7577 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1528661e5df6d9732ebfeb414576c82ec5c92241 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil