hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chaoyu Tang <ctang...@gmail.com>
Subject Re: Review Request 52559: HIVE-14799: Query operation are not thread safe during its cancellation
Date Thu, 06 Oct 2016 05:09:27 GMT


> On Oct. 5, 2016, 8:07 p.m., Sergey Shelukhin wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 3144
> > <https://reviews.apache.org/r/52559/diff/1/?file=1523051#file1523051line3144>
> >
> >     nit: needs a better explanation from a user's perspective... user doesn't know
what is a driver and why would it be closed is not clear here

rewording the explanation and hope it will be better. feel free to suggest one if you have
a good one.


> On Oct. 5, 2016, 8:07 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 346
> > <https://reviews.apache.org/r/52559/diff/1/?file=1523052#file1523052line346>
> >
> >     is this called from elsewhere too? the method existed before

It is just a default implementation for the interface CommandProcessor, and not used at all
before. To be less confusing, I won't change this init() and the DriverState is default initiated
as INITIALIZED.


> On Oct. 5, 2016, 8:07 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 564
> > <https://reviews.apache.org/r/52559/diff/1/?file=1523052#file1523052line564>
> >
> >     iirc if you catch InterruptException, in the catch block the interrupt flag
would be reset, so this will not return true

yes, the isInterrupt is implemented as
  private boolean isInterrupted() {
    return Thread.currentThread().isInterrupted() || interrupt;
  }
and so far we do not have wait/sleep which can throw InterruptException before any added checkpoints.
In addition, the volatile variable interrupt is set when close is called and unset after it
exits. So the isInterrupt is basically not affected by InterruptException and there is no
concern.


> On Oct. 5, 2016, 8:07 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 1913
> > <https://reviews.apache.org/r/52559/diff/1/?file=1523052#file1523052line1913>
> >
> >     same thing wrt interrupt state

won't be a concern as commented above


> On Oct. 5, 2016, 8:07 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 2238
> > <https://reviews.apache.org/r/52559/diff/1/?file=1523052#file1523052line2238>
> >
> >     1) these states do not appear thread-safe themselves
> >     2) why are so many states needed, by the way?
> >     e.g. what difference does it make for termination if it's compiling or running?

1) the driverState is volatile, should be thread safe
2) they corresponds to different driver states since the driver is a stateful object. I remove
the "UNINITILIAZED" and initialize DriverState default as INITIALIZED.
We only have state RUNNING which represents the driver is compiling, executing, or both in
one call (like run with alreadyCompiled false). No matter the other query is compiling or
executing, for close there is no difference and it should wait anyway until the other process
finishes or interrupts, so one state RUNNING is sufficient to cover compiling or/and executing
cases.


> On Oct. 5, 2016, 8:07 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 1155
> > <https://reviews.apache.org/r/52559/diff/1/?file=1523052#file1523052line1155>
> >
> >     is it possible to add a lock object for this and add javadoc to make it explicit
what it covers? I doesn't look like all ctx accesses are covered.

I changed to use a reentrant lock for all the synchronized calls. 
The reason I used the synchronized (this) since it is sufficient in this case and we do not
need some advanced features from using the lock object, in addition, there is an existing
method like releasePlan(plan) is already the synchronized method.
We do not the locks for ctx access since it has been gurantteed to be accessed by one thread
since the close thread is halt in waiting when the query process is accesing this variable.


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52559/#review151537
-----------------------------------------------------------


On Oct. 5, 2016, 4:56 p.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52559/
> -----------------------------------------------------------
> 
> (Updated Oct. 5, 2016, 4:56 p.m.)
> 
> 
> Review request for hive, Sergey Shelukhin, Thejas Nair, Vaibhav Gumashta, and Yongzhi
Chen.
> 
> 
> Bugs: HIVE-14799
>     https://issues.apache.org/jira/browse/HIVE-14799
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch is going to fix a couple of Driver issues related to the close request from
a thread other than the one running the query (e.g. from SQLOperation cancel via Timeout or
Ctrl-C):
> 1. Driver is not thread safe and usually supports only one thread at time since it has
variables like ctx, plan which are not thread protected. But certain special use cases need
access the Driver objects from multiply threads. For example, when a query runs in a background
thread, driver.close is invoked in another thread by the query timeout (see HIVE-4924). The
close process could nullify the shared variables like ctx which could cause NPE in the other
query thread which is using them. This runtime exception is unpredictable and not well handled
in the code. Some resources (e.g. locks, files) are left behind and not be cleaned because
there are no more available = references to them. In this patch, I use the waiting in the
close which makes sure only one thread uses these variables and the resource cleaning happens
after the query finished (or interrupted).
> 2. SQLOperation.cancel sends the interrupt signal to the background thread running the
query (via backgroundHandle.cancel(true)) but it could not stop that process since there is
no code to capture the signal in the process. In another word, current timeout code could
not gracefully and promptly stop the query process, though it could eventually stop the process
by killing the running tasks (e.g. MapRedTask) via driverContext.shutdown (see HIVE-5901).
So in the patch, I added a couple of checkpoints to intercept the interrupt signal either
set by close method (a volatile variable) or thread.interrupt. They should be helpful to capture
these signals earlier , though not intermediately.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 4c3ef3e 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java dd55434 
> 
> Diff: https://reviews.apache.org/r/52559/diff/
> 
> 
> Testing
> -------
> 
> Manually tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message