hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Shelukhin <ser...@hortonworks.com>
Subject Re: Review Request 52559: HIVE-14799: Query operation are not thread safe during its cancellation
Date Wed, 12 Oct 2016 01:25:28 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52559/#review152243
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/Driver.java (line 193)
<https://reviews.apache.org/r/52559/#comment221144>

    comments for closed, destroyed, error and interrupt states would be useful



ql/src/java/org/apache/hadoop/hive/ql/Driver.java (line 379)
<https://reviews.apache.org/r/52559/#comment221130>

    nit: here and elsewhere, lock should be outside of the try, since we don't want to call
unlock if the lock call itself fails



ql/src/java/org/apache/hadoop/hive/ql/Driver.java (line 611)
<https://reviews.apache.org/r/52559/#comment221138>

    this will overwrite the previous error (this being finally) without logging it (other
than to console). In fact the original error if any is probably more useful to the user...



ql/src/java/org/apache/hadoop/hive/ql/Driver.java (line 629)
<https://reviews.apache.org/r/52559/#comment221139>

    this ignores downstreamError. By design? Same for similar code below. The existing code
with this pattern returns immediately in the catch that sets downstreamError, but here the
execution can continue from the above catch



ql/src/java/org/apache/hadoop/hive/ql/Driver.java (line 1399)
<https://reviews.apache.org/r/52559/#comment221133>

    nit: typo
    
    also is it really true for every other possible state?



ql/src/java/org/apache/hadoop/hive/ql/Driver.java (line 1557)
<https://reviews.apache.org/r/52559/#comment221135>

    some interrupt-related return paths in the try above will result in isFinishedWithError
being true here... it seems like that's by design right now because error and interrupt conditions
are not distinguished here. It might be helpful to adcd a comment



ql/src/java/org/apache/hadoop/hive/ql/Driver.java (line 1989)
<https://reviews.apache.org/r/52559/#comment221141>

    same as above - should we return here? seems like the below ignores downstreamError on
some paths



ql/src/java/org/apache/hadoop/hive/ql/Driver.java (line 2245)
<https://reviews.apache.org/r/52559/#comment221142>

    does it have to throw here or can it just exit? the old code seems to only do stuff when
in the right state, and nothing in the wrong state



ql/src/java/org/apache/hadoop/hive/ql/Driver.java (line 2390)
<https://reviews.apache.org/r/52559/#comment221143>

    I don't see an explicit check for this state anywhere in the code, except for some error
paths. What is setting driver state to destroyed supposed to trigger?


- Sergey Shelukhin


On Oct. 11, 2016, 10:12 p.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52559/
> -----------------------------------------------------------
> 
> (Updated Oct. 11, 2016, 10:12 p.m.)
> 
> 
> Review request for hive, Sergey Shelukhin, Thejas Nair, Vaibhav Gumashta, and Yongzhi
Chen.
> 
> 
> Bugs: HIVE-14799
>     https://issues.apache.org/jira/browse/HIVE-14799
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch is going to fix a couple of Driver issues related to the close request from
a thread other than the one running the query (e.g. from SQLOperation cancel via Timeout or
Ctrl-C):
> 1. Driver is not thread safe and usually supports only one thread at time since it has
variables like ctx, plan which are not thread protected. But certain special use cases need
access the Driver objects from multiply threads. For example, when a query runs in a background
thread, driver.close is invoked in another thread by the query timeout (see HIVE-4924). The
close process could nullify the shared variables like ctx which could cause NPE in the other
query thread which is using them. This runtime exception is unpredictable and not well handled
in the code. Some resources (e.g. locks, files) are left behind and not be cleaned because
there are no more available = references to them. In this patch, I use the waiting in the
close which makes sure only one thread uses these variables and the resource cleaning happens
after the query finished (or interrupted).
> 2. SQLOperation.cancel sends the interrupt signal to the background thread running the
query (via backgroundHandle.cancel(true)) but it could not stop that process since there is
no code to capture the signal in the process. In another word, current timeout code could
not gracefully and promptly stop the query process, though it could eventually stop the process
by killing the running tasks (e.g. MapRedTask) via driverContext.shutdown (see HIVE-5901).
So in the patch, I added a couple of checkpoints to intercept the interrupt signal either
set by close method (a volatile variable) or thread.interrupt. They should be helpful to capture
these signals earlier , though not intermediately.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java dd55434 
> 
> Diff: https://reviews.apache.org/r/52559/diff/
> 
> 
> Testing
> -------
> 
> Manually tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message