Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Derby Discussion" <derby-user@db.apache.org>
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <AANLkTimX7JPnq82a-E3s10yRAyGbC6cyHPEHCTJ-I3Of@mail.gmail.com>
References: <AANLkTimmz4VHjRFJSbNl5C0Y-gKYMQPK0Z9J683RBzND@mail.gmail.com>
	<4C1F2855.404@Sun.COM>
	<AANLkTimX7JPnq82a-E3s10yRAyGbC6cyHPEHCTJ-I3Of@mail.gmail.com>
Date: Tue, 22 Jun 2010 16:46:25 +1000
Message-ID: <AANLkTimZp1Markjd-LoO2uLq_iJAAxuIRcuwH4OYiU0I@mail.gmail.com>
Subject: Re: Hung thread after another thread is interrupted
From: Luke Quinane <luke@nuix.com>
To: Derby Discussion <derby-user@db.apache.org>
Content-Type: multipart/mixed; boundary=000e0cd723b0b142b6048998c768

--000e0cd723b0b142b6048998c768
Content-Type: multipart/alternative; boundary=000e0cd723b0b142ab048998c766

--000e0cd723b0b142ab048998c766
Content-Type: text/plain; charset=ISO-8859-1

Hi Kristian,

I've played around with the code a bit during the day and the attached patch
seems to fix the problem where another thread is stuck. I've never really
look at the Derby code before so I'm not confident that it is correct
however it did seem to fix things for me.


Cheers,

Luke


On Tue, Jun 22, 2010 at 11:25 AM, Luke Quinane <luke@nuix.com> wrote:

> Hi Kristian,
>
> I've attached our Derby log file that has all the threads that were
> interrupted when the problem occurred (quite a few).
>
> Our application was in the middle of a user initiated query when the user
> started another operation. For performance reasons we call shutdown() on the
> previous thread pool which causes the interrupt() call on all the threads.
> As you can see in the stack traces some threads are querying Lucene and some
> are querying Derby.
>
> Mostly we just want to cancel the current query but from the higher levels
> of the application its not apparent if a thread is processing a Lucene or a
> Derby query.
>
>
> Thanks again,
>
> Luke
>
>
> On Mon, Jun 21, 2010 at 6:52 PM, Kristian Waagan <Kristian.Waagan@sun.com>wrote:
>
>> On 21.06.10 09:20, Luke Quinane wrote:
>>
>>> Hi all,
>>>
>>> We've seen a problem today where we have several threads querying our
>>> database and when one gets interrupted the others are stuck waiting for a
>>> lock. Here is the stack trace for the stuck thread(s):
>>>
>>> daemon prio=4
>>> "DefaultExecutorService-pool-1-thread-47" Id=98 WAITING on
>>> org.apache.derby.impl.services.locks.ActiveLock@6e6f45a1
>>> at java.lang.Object.wait(Native Method)
>>> -  waiting on org.apache.derby.impl.services.locks.ActiveLock@6e6f45a1
>>> at java.lang.Object.wait(Object.java:485)
>>> at
>>> org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(ActiveLock.java:115)
>>> at
>>> org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:463)
>>> at
>>> org.apache.derby.impl.services.locks.ConcurrentLockSet.zeroDurationLockObject(ConcurrentLockSet.java:855)
>>> at
>>> org.apache.derby.impl.services.locks.AbstractPool.zeroDurationlockObject(AbstractPool.java:297)
>>> at
>>> org.apache.derby.impl.services.locks.ConcurrentPool.zeroDurationlockObject(ConcurrentPool.java:28)
>>> at
>>> org.apache.derby.impl.store.raw.xact.RowLocking2nohold.lockRecordForRead(RowLocking2nohold.java:89)
>>> at
>>> org.apache.derby.impl.store.access.heap.HeapController.lockRow(HeapController.java:520)
>>> at
>>> org.apache.derby.impl.store.access.heap.HeapController.lockRow(HeapController.java:638)
>>> at
>>> org.apache.derby.impl.store.access.btree.index.B2IRowLocking3.lockRowOnPage(B2IRowLocking3.java:309)
>>> at
>>> org.apache.derby.impl.store.access.btree.index.B2IRowLocking3._lockScanRow(B2IRowLocking3.java:599)
>>> at
>>> org.apache.derby.impl.store.access.btree.index.B2IRowLockingRR.lockScanRow(B2IRowLockingRR.java:105)
>>> at
>>> org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(BTreeForwardScan.java:305)
>>> at
>>> org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(BTreeScan.java:1585)
>>> at
>>> org.apache.derby.impl.sql.execute.BulkTableScanResultSet.reloadArray(BulkTableScanResultSet.java:327)
>>> at
>>> org.apache.derby.impl.sql.execute.BulkTableScanResultSet.getNextRowCore(BulkTableScanResultSet.java:282)
>>> at
>>> org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(BasicNoPutResultSetImpl.java:460)
>>> at
>>> org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(EmbedResultSet.java:427)
>>> -  locked org.apache.derby.impl.jdbc.EmbedConnection40@445d374b
>>> at
>>> org.apache.derby.impl.jdbc.EmbedResultSet.next(EmbedResultSet.java:371)
>>>        ...
>>>
>>> So is using Thread.interrupt() the correct way to interrupt a long
>>> running query? Currently our worker threads are using
>>> 'java.util.concurrent.ThreadPoolExecutor' to run our queries and it calls
>>> 'interrupt()' when it shuts down.
>>>
>>
>> Hi Luke,
>>
>> Unfortunately, Derby isn't currently able to cope correctly with getting
>> interrupted by Thread.interrupt(). Based on limited information, and as it
>> stands today, I think your options are:
>>  a) code special shutdown logic for Derby
>>  b) do not invoke Thread.interrupt()
>>  c) force Derby to not use NIO (?)
>>
>> This issue seems like something the Derby community has to address, as it
>> is easily encountered using Java standard libraries (i.e.
>> java.util.concurrent).
>> It is not clear to me why the lock system fails. Do you have a stack trace
>> from the thread being interrupted?
>>
>> Also, can you describe your requirements for the process issuing
>> Thread.interrupt()?
>> I.e, do you want to shut down your application "hard", or just to cancel a
>> single running query?
>> You may find some helpful information in this thread (see especially the
>> posts from Stephan and Knut):
>> http://old.nabble.com/cancel-a-running-query--td19903183.html
>>
>>
>> Regards,
>> --
>> Kristian
>>
>>
>>
>>> This was with Derby 10.5.3.0 and JRE 1.6.0_19-b04 on Windows Server 2008.
>>>
>>>
>>> Thanks in advance,
>>>
>>>
>>> Luke Quinane
>>>
>>>
>>>
>>
>

--000e0cd723b0b142ab048998c766
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi=A0Kristian,<div><br></div><div>I&#39;ve played around with the code a bi=
t during the day and the attached patch seems to fix the problem where anot=
her thread is stuck. I&#39;ve never really look at the Derby code before so=
 I&#39;m not confident that it is correct however it did seem to fix things=
 for me.</div>
<div><br></div><div><br></div><div>Cheers,</div><div><br></div><div>Luke</d=
iv><div><br><br><div class=3D"gmail_quote">On Tue, Jun 22, 2010 at 11:25 AM=
, Luke Quinane <span dir=3D"ltr">&lt;<a href=3D"mailto:luke@nuix.com">luke@=
nuix.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">Hi=A0Kristian,<div><br></div><div>I&#39;ve =
attached our Derby log file that has all the threads that were interrupted =
when the problem occurred (quite a few).</div>
<div><br></div><div>Our application was in the middle of a user initiated q=
uery when the user started another operation. For performance reasons we ca=
ll shutdown() on the previous thread pool which causes the interrupt() call=
 on all the threads. As you can see in the stack traces some threads are qu=
erying Lucene and some are querying Derby.</div>

<div><br></div><div>Mostly we just want to cancel the current query but fro=
m the higher levels of the application its not apparent if a thread is proc=
essing a Lucene or a Derby query.</div><div><br></div><div><br></div><div>

Thanks again,</div><div><br></div><font color=3D"#888888"><div>Luke</div></=
font><div><div></div><div class=3D"h5"><div><br></div><div><br><div class=
=3D"gmail_quote">On Mon, Jun 21, 2010 at 6:52 PM, Kristian Waagan <span dir=
=3D"ltr">&lt;<a href=3D"mailto:Kristian.Waagan@sun.com" target=3D"_blank">K=
ristian.Waagan@sun.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div><div></div><div>On 21.06.10 09:20, Luke=
 Quinane wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Hi all,<br>
<br>
We&#39;ve seen a problem today where we have several threads querying our d=
atabase and when one gets interrupted the others are stuck waiting for a lo=
ck. Here is the stack trace for the stuck thread(s):<br>
<br>
daemon prio=3D4<br>
&quot;DefaultExecutorService-pool-1-thread-47&quot; Id=3D98 WAITING on org.=
apache.derby.impl.services.locks.ActiveLock@6e6f45a1<br>
at java.lang.Object.wait(Native Method)<br>
- =A0waiting on org.apache.derby.impl.services.locks.ActiveLock@6e6f45a1<br=
>
at java.lang.Object.wait(Object.java:485)<br>
at org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(ActiveLock.=
java:115)<br>
at org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(Concur=
rentLockSet.java:463)<br>
at org.apache.derby.impl.services.locks.ConcurrentLockSet.zeroDurationLockO=
bject(ConcurrentLockSet.java:855)<br>
at org.apache.derby.impl.services.locks.AbstractPool.zeroDurationlockObject=
(AbstractPool.java:297)<br>
at org.apache.derby.impl.services.locks.ConcurrentPool.zeroDurationlockObje=
ct(ConcurrentPool.java:28)<br>
at org.apache.derby.impl.store.raw.xact.RowLocking2nohold.lockRecordForRead=
(RowLocking2nohold.java:89)<br>
at org.apache.derby.impl.store.access.heap.HeapController.lockRow(HeapContr=
oller.java:520)<br>
at org.apache.derby.impl.store.access.heap.HeapController.lockRow(HeapContr=
oller.java:638)<br>
at org.apache.derby.impl.store.access.btree.index.B2IRowLocking3.lockRowOnP=
age(B2IRowLocking3.java:309)<br>
at org.apache.derby.impl.store.access.btree.index.B2IRowLocking3._lockScanR=
ow(B2IRowLocking3.java:599)<br>
at org.apache.derby.impl.store.access.btree.index.B2IRowLockingRR.lockScanR=
ow(B2IRowLockingRR.java:105)<br>
at org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(BTre=
eForwardScan.java:305)<br>
at org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(BTreeS=
can.java:1585)<br>
at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.reloadArray(Bul=
kTableScanResultSet.java:327)<br>
at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.getNextRowCore(=
BulkTableScanResultSet.java:282)<br>
at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Bas=
icNoPutResultSetImpl.java:460)<br>
at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(EmbedResultSet.ja=
va:427)<br>
- =A0locked org.apache.derby.impl.jdbc.EmbedConnection40@445d374b<br>
at org.apache.derby.impl.jdbc.EmbedResultSet.next(EmbedResultSet.java:371)<=
br>
 =A0 =A0 =A0 =A0...<br>
<br>
So is using Thread.interrupt() the correct way to interrupt a long running =
query? Currently our worker threads are using &#39;java.util.concurrent.Thr=
eadPoolExecutor&#39; to run our queries and it calls &#39;interrupt()&#39; =
when it shuts down.<br>


</blockquote>
<br></div></div>
Hi Luke,<br>
<br>
Unfortunately, Derby isn&#39;t currently able to cope correctly with gettin=
g interrupted by Thread.interrupt(). Based on limited information, and as i=
t stands today, I think your options are:<br>
=A0a) code special shutdown logic for Derby<br>
=A0b) do not invoke Thread.interrupt()<br>
=A0c) force Derby to not use NIO (?)<br>
<br>
This issue seems like something the Derby community has to address, as it i=
s easily encountered using Java standard libraries (i.e. java.util.concurre=
nt).<br>
It is not clear to me why the lock system fails. Do you have a stack trace =
from the thread being interrupted?<br>
<br>
Also, can you describe your requirements for the process issuing Thread.int=
errupt()?<br>
I.e, do you want to shut down your application &quot;hard&quot;, or just to=
 cancel a single running query?<br>
You may find some helpful information in this thread (see especially the po=
sts from Stephan and Knut): <a href=3D"http://old.nabble.com/cancel-a-runni=
ng-query--td19903183.html" target=3D"_blank">http://old.nabble.com/cancel-a=
-running-query--td19903183.html</a><br>


<br>
<br>
Regards,<br>
-- <br><font color=3D"#888888">
Kristian</font><div><div></div><div><br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<br>
This was with Derby 10.5.3.0 and JRE 1.6.0_19-b04 on Windows Server 2008.<b=
r>
<br>
<br>
Thanks in advance,<br>
<br>
<br>
Luke Quinane<br>
<br>
<br>
</blockquote>
<br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--000e0cd723b0b142ab048998c766--

--000e0cd723b0b142b6048998c768
Content-Type: application/octet-stream;
	name="interrupted-exception-fix.patch"
Content-Disposition: attachment; filename="interrupted-exception-fix.patch"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gaqdfpju1

SW5kZXg6IEM6L1dvcmtpbmcvdmVuZG9yL2RlcmJ5L2Rpc3QvamF2YS9lbmdpbmUvb3JnL2FwYWNo
ZS9kZXJieS9pbXBsL3NlcnZpY2VzL2xvY2tzL0xvY2tTZXQuamF2YQ0KPT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQ0KLS0t
IEM6L1dvcmtpbmcvdmVuZG9yL2RlcmJ5L2Rpc3QvamF2YS9lbmdpbmUvb3JnL2FwYWNoZS9kZXJi
eS9pbXBsL3NlcnZpY2VzL2xvY2tzL0xvY2tTZXQuamF2YQkocmV2aXNpb24gOTM0OCkNCisrKyBD
Oi9Xb3JraW5nL3ZlbmRvci9kZXJieS9kaXN0L2phdmEvZW5naW5lL29yZy9hcGFjaGUvZGVyYnkv
aW1wbC9zZXJ2aWNlcy9sb2Nrcy9Mb2NrU2V0LmphdmEJKHJldmlzaW9uIDEyMDA4KQ0KQEAgLTI3
MiwxMyArMjcyLDE4IEBADQogICAgICAgICBsb25nIHN0YXJ0V2FpdFRpbWUgPSAwOw0KIA0KIGZv
cmV2ZXI6CWZvciAoOzspIHsNCi0NCi0gICAgICAgICAgICAgICAgYnl0ZSB3YWtldXBSZWFzb24g
PSB3YWl0aW5nTG9jay53YWl0Rm9yR3JhbnQoYWN0dWFsVGltZW91dCk7DQotICAgICAgICAgICAg
ICAgIA0KKyAgICAgICAgICAgICAgICBieXRlIHdha2V1cFJlYXNvbiA9IDA7DQogICAgICAgICAg
ICAgICAgIEFjdGl2ZUxvY2sgbmV4dFdhaXRpbmdMb2NrID0gbnVsbDsNCiAgICAgICAgICAgICAg
ICAgT2JqZWN0W10gZGVhZGxvY2tEYXRhID0gbnVsbDsNCiANCiAgICAgICAgICAgICAgICAgdHJ5
IHsNCisgICAgICAgICAgICAgICAgICAgIHRyeSB7DQorICAgICAgICAgICAgICAgICAgICAgICAg
d2FrZXVwUmVhc29uID0gd2FpdGluZ0xvY2sud2FpdEZvckdyYW50KGFjdHVhbFRpbWVvdXQpOw0K
KyAgICAgICAgICAgICAgICAgICAgfSBjYXRjaChTdGFuZGFyZEV4Y2VwdGlvbiBlKSB7DQorICAg
ICAgICAgICAgICAgICAgICAgICAgbmV4dFdhaXRpbmdMb2NrID0gY29udHJvbC5nZXROZXh0V2Fp
dGVyKHdhaXRpbmdMb2NrLCB0cnVlLCB0aGlzKTsNCisgICAgICAgICAgICAgICAgICAgICAgICB0
aHJvdyBlOw0KKyAgICAgICAgICAgICAgICAgICAgfQ0KKw0KICAgICAgICAgICAgICAgICAgICAg
Ym9vbGVhbiB3aWxsUXVpdFdhaXQ7DQogICAgICAgICAgICAgICAgICAgICBFbnVtZXJhdGlvbiB0
aW1lb3V0TG9ja1RhYmxlID0gbnVsbDsNCiAgICAgICAgICAgICAgICAgICAgIGxvbmcgY3VycmVu
dFRpbWUgPSAwOw0KSW5kZXg6IEM6L1dvcmtpbmcvdmVuZG9yL2RlcmJ5L2Rpc3QvamF2YS9lbmdp
bmUvb3JnL2FwYWNoZS9kZXJieS9pbXBsL3NlcnZpY2VzL2xvY2tzL0NvbmN1cnJlbnRMb2NrU2V0
LmphdmENCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT0NCi0tLSBDOi9Xb3JraW5nL3ZlbmRvci9kZXJieS9kaXN0L2phdmEv
ZW5naW5lL29yZy9hcGFjaGUvZGVyYnkvaW1wbC9zZXJ2aWNlcy9sb2Nrcy9Db25jdXJyZW50TG9j
a1NldC5qYXZhCShyZXZpc2lvbiA5MzQ4KQ0KKysrIEM6L1dvcmtpbmcvdmVuZG9yL2RlcmJ5L2Rp
c3QvamF2YS9lbmdpbmUvb3JnL2FwYWNoZS9kZXJieS9pbXBsL3NlcnZpY2VzL2xvY2tzL0NvbmN1
cnJlbnRMb2NrU2V0LmphdmEJKHJldmlzaW9uIDEyMDA4KQ0KQEAgLTQ1OSwxMyArNDU5LDE4IEBA
DQogICAgICAgICBsb25nIHN0YXJ0V2FpdFRpbWUgPSAwOw0KIA0KIGZvcmV2ZXI6CWZvciAoOzsp
IHsNCi0NCi0gICAgICAgICAgICAgICAgYnl0ZSB3YWtldXBSZWFzb24gPSB3YWl0aW5nTG9jay53
YWl0Rm9yR3JhbnQoYWN0dWFsVGltZW91dCk7DQotICAgICAgICAgICAgICAgIA0KKyAgICAgICAg
ICAgICAgICBieXRlIHdha2V1cFJlYXNvbiA9IDA7DQogICAgICAgICAgICAgICAgIEFjdGl2ZUxv
Y2sgbmV4dFdhaXRpbmdMb2NrID0gbnVsbDsNCiAgICAgICAgICAgICAgICAgT2JqZWN0W10gZGVh
ZGxvY2tEYXRhID0gbnVsbDsNCiANCiAgICAgICAgICAgICAgICAgdHJ5IHsNCisgICAgICAgICAg
ICAgICAgICAgIHRyeSB7DQorICAgICAgICAgICAgICAgICAgICAgICAgd2FrZXVwUmVhc29uID0g
d2FpdGluZ0xvY2sud2FpdEZvckdyYW50KGFjdHVhbFRpbWVvdXQpOw0KKyAgICAgICAgICAgICAg
ICAgICAgfSBjYXRjaChTdGFuZGFyZEV4Y2VwdGlvbiBlKSB7DQorICAgICAgICAgICAgICAgICAg
ICAgICAgbmV4dFdhaXRpbmdMb2NrID0gY29udHJvbC5nZXROZXh0V2FpdGVyKHdhaXRpbmdMb2Nr
LCB0cnVlLCB0aGlzKTsNCisgICAgICAgICAgICAgICAgICAgICAgICB0aHJvdyBlOw0KKyAgICAg
ICAgICAgICAgICAgICAgfQ0KKw0KICAgICAgICAgICAgICAgICAgICAgYm9vbGVhbiB3aWxsUXVp
dFdhaXQ7DQogICAgICAgICAgICAgICAgICAgICBFbnVtZXJhdGlvbiB0aW1lb3V0TG9ja1RhYmxl
ID0gbnVsbDsNCiAgICAgICAgICAgICAgICAgICAgIGxvbmcgY3VycmVudFRpbWUgPSAwOw0K
--000e0cd723b0b142b6048998c768--