Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 198F2200B9D for ; Thu, 13 Oct 2016 20:39:55 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 13C8F160AE4; Thu, 13 Oct 2016 18:39:55 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 59B13160AD2 for ; Thu, 13 Oct 2016 20:39:54 +0200 (CEST) Received: (qmail 94040 invoked by uid 500); 13 Oct 2016 18:39:53 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 94018 invoked by uid 99); 13 Oct 2016 18:39:52 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Oct 2016 18:39:52 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 121152D080C; Thu, 13 Oct 2016 18:39:52 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============5600669269107333255==" MIME-Version: 1.0 Subject: Re: Review Request 52559: HIVE-14799: Query operation are not thread safe during its cancellation From: Yongzhi Chen To: Yongzhi Chen , Thejas Nair , Vaibhav Gumashta , Sergey Shelukhin Cc: hive , Chaoyu Tang Date: Thu, 13 Oct 2016 18:39:52 -0000 Message-ID: <20161013183952.1719.45529@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Yongzhi Chen X-ReviewGroup: hive X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/52559/ X-Sender: Yongzhi Chen References: <20161012043159.1719.57788@reviews.apache.org> In-Reply-To: <20161012043159.1719.57788@reviews.apache.org> Reply-To: Yongzhi Chen X-ReviewRequest-Repository: hive-git archived-at: Thu, 13 Oct 2016 18:39:55 -0000 --===============5600669269107333255== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52559/#review152556 ----------------------------------------------------------- It is not thread safe for releaseDriverContext can be called in compling mode from cancel, but seems only null value matters. So use the local variable(driverCxt) to avoid NPE after the close() is called from cancel? - Yongzhi Chen On Oct. 12, 2016, 4:31 a.m., Chaoyu Tang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52559/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2016, 4:31 a.m.) > > > Review request for hive, Sergey Shelukhin, Thejas Nair, Vaibhav Gumashta, and Yongzhi Chen. > > > Bugs: HIVE-14799 > https://issues.apache.org/jira/browse/HIVE-14799 > > > Repository: hive-git > > > Description > ------- > > This patch is going to fix a couple of Driver issues related to the close request from a thread other than the one running the query (e.g. from SQLOperation cancel via Timeout or Ctrl-C): > 1. Driver is not thread safe and usually supports only one thread at time since it has variables like ctx, plan which are not thread protected. But certain special use cases need access the Driver objects from multiply threads. For example, when a query runs in a background thread, driver.close is invoked in another thread by the query timeout (see HIVE-4924). The close process could nullify the shared variables like ctx which could cause NPE in the other query thread which is using them. This runtime exception is unpredictable and not well handled in the code. Some resources (e.g. locks, files) are left behind and not be cleaned because there are no more available = references to them. In this patch, I use the waiting in the close which makes sure only one thread uses these variables and the resource cleaning happens after the query finished (or interrupted). > 2. SQLOperation.cancel sends the interrupt signal to the background thread running the query (via backgroundHandle.cancel(true)) but it could not stop that process since there is no code to capture the signal in the process. In another word, current timeout code could not gracefully and promptly stop the query process, though it could eventually stop the process by killing the running tasks (e.g. MapRedTask) via driverContext.shutdown (see HIVE-5901). So in the patch, I added a couple of checkpoints to intercept the interrupt signal either set by close method (a volatile variable) or thread.interrupt. They should be helpful to capture these signals earlier , though not intermediately. > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java dd55434 > > Diff: https://reviews.apache.org/r/52559/diff/ > > > Testing > ------- > > Manually tests > Precommit tests > > > Thanks, > > Chaoyu Tang > > --===============5600669269107333255==--