Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 1807 invoked from network); 22 Jun 2010 06:46:57 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Jun 2010 06:46:57 -0000 Received: (qmail 69076 invoked by uid 500); 22 Jun 2010 06:46:57 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 68712 invoked by uid 500); 22 Jun 2010 06:46:54 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 68704 invoked by uid 99); 22 Jun 2010 06:46:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jun 2010 06:46:53 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.160.172] (HELO mail-gy0-f172.google.com) (209.85.160.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jun 2010 06:46:46 +0000 Received: by gyh4 with SMTP id 4so2968705gyh.31 for ; Mon, 21 Jun 2010 23:46:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.56.32 with SMTP id e32mr5504257yba.127.1277189185159; Mon, 21 Jun 2010 23:46:25 -0700 (PDT) Received: by 10.150.182.5 with HTTP; Mon, 21 Jun 2010 23:46:25 -0700 (PDT) In-Reply-To: References: <4C1F2855.404@Sun.COM> Date: Tue, 22 Jun 2010 16:46:25 +1000 Message-ID: Subject: Re: Hung thread after another thread is interrupted From: Luke Quinane To: Derby Discussion Content-Type: multipart/mixed; boundary=000e0cd723b0b142b6048998c768 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd723b0b142b6048998c768 Content-Type: multipart/alternative; boundary=000e0cd723b0b142ab048998c766 --000e0cd723b0b142ab048998c766 Content-Type: text/plain; charset=ISO-8859-1 Hi Kristian, I've played around with the code a bit during the day and the attached patch seems to fix the problem where another thread is stuck. I've never really look at the Derby code before so I'm not confident that it is correct however it did seem to fix things for me. Cheers, Luke On Tue, Jun 22, 2010 at 11:25 AM, Luke Quinane wrote: > Hi Kristian, > > I've attached our Derby log file that has all the threads that were > interrupted when the problem occurred (quite a few). > > Our application was in the middle of a user initiated query when the user > started another operation. For performance reasons we call shutdown() on the > previous thread pool which causes the interrupt() call on all the threads. > As you can see in the stack traces some threads are querying Lucene and some > are querying Derby. > > Mostly we just want to cancel the current query but from the higher levels > of the application its not apparent if a thread is processing a Lucene or a > Derby query. > > > Thanks again, > > Luke > > > On Mon, Jun 21, 2010 at 6:52 PM, Kristian Waagan wrote: > >> On 21.06.10 09:20, Luke Quinane wrote: >> >>> Hi all, >>> >>> We've seen a problem today where we have several threads querying our >>> database and when one gets interrupted the others are stuck waiting for a >>> lock. Here is the stack trace for the stuck thread(s): >>> >>> daemon prio=4 >>> "DefaultExecutorService-pool-1-thread-47" Id=98 WAITING on >>> org.apache.derby.impl.services.locks.ActiveLock@6e6f45a1 >>> at java.lang.Object.wait(Native Method) >>> - waiting on org.apache.derby.impl.services.locks.ActiveLock@6e6f45a1 >>> at java.lang.Object.wait(Object.java:485) >>> at >>> org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(ActiveLock.java:115) >>> at >>> org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:463) >>> at >>> org.apache.derby.impl.services.locks.ConcurrentLockSet.zeroDurationLockObject(ConcurrentLockSet.java:855) >>> at >>> org.apache.derby.impl.services.locks.AbstractPool.zeroDurationlockObject(AbstractPool.java:297) >>> at >>> org.apache.derby.impl.services.locks.ConcurrentPool.zeroDurationlockObject(ConcurrentPool.java:28) >>> at >>> org.apache.derby.impl.store.raw.xact.RowLocking2nohold.lockRecordForRead(RowLocking2nohold.java:89) >>> at >>> org.apache.derby.impl.store.access.heap.HeapController.lockRow(HeapController.java:520) >>> at >>> org.apache.derby.impl.store.access.heap.HeapController.lockRow(HeapController.java:638) >>> at >>> org.apache.derby.impl.store.access.btree.index.B2IRowLocking3.lockRowOnPage(B2IRowLocking3.java:309) >>> at >>> org.apache.derby.impl.store.access.btree.index.B2IRowLocking3._lockScanRow(B2IRowLocking3.java:599) >>> at >>> org.apache.derby.impl.store.access.btree.index.B2IRowLockingRR.lockScanRow(B2IRowLockingRR.java:105) >>> at >>> org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(BTreeForwardScan.java:305) >>> at >>> org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(BTreeScan.java:1585) >>> at >>> org.apache.derby.impl.sql.execute.BulkTableScanResultSet.reloadArray(BulkTableScanResultSet.java:327) >>> at >>> org.apache.derby.impl.sql.execute.BulkTableScanResultSet.getNextRowCore(BulkTableScanResultSet.java:282) >>> at >>> org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(BasicNoPutResultSetImpl.java:460) >>> at >>> org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(EmbedResultSet.java:427) >>> - locked org.apache.derby.impl.jdbc.EmbedConnection40@445d374b >>> at >>> org.apache.derby.impl.jdbc.EmbedResultSet.next(EmbedResultSet.java:371) >>> ... >>> >>> So is using Thread.interrupt() the correct way to interrupt a long >>> running query? Currently our worker threads are using >>> 'java.util.concurrent.ThreadPoolExecutor' to run our queries and it calls >>> 'interrupt()' when it shuts down. >>> >> >> Hi Luke, >> >> Unfortunately, Derby isn't currently able to cope correctly with getting >> interrupted by Thread.interrupt(). Based on limited information, and as it >> stands today, I think your options are: >> a) code special shutdown logic for Derby >> b) do not invoke Thread.interrupt() >> c) force Derby to not use NIO (?) >> >> This issue seems like something the Derby community has to address, as it >> is easily encountered using Java standard libraries (i.e. >> java.util.concurrent). >> It is not clear to me why the lock system fails. Do you have a stack trace >> from the thread being interrupted? >> >> Also, can you describe your requirements for the process issuing >> Thread.interrupt()? >> I.e, do you want to shut down your application "hard", or just to cancel a >> single running query? >> You may find some helpful information in this thread (see especially the >> posts from Stephan and Knut): >> http://old.nabble.com/cancel-a-running-query--td19903183.html >> >> >> Regards, >> -- >> Kristian >> >> >> >>> This was with Derby 10.5.3.0 and JRE 1.6.0_19-b04 on Windows Server 2008. >>> >>> >>> Thanks in advance, >>> >>> >>> Luke Quinane >>> >>> >>> >> > --000e0cd723b0b142ab048998c766 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi=A0Kristian,

I've played around with the code a bi= t during the day and the attached patch seems to fix the problem where anot= her thread is stuck. I've never really look at the Derby code before so= I'm not confident that it is correct however it did seem to fix things= for me.


Cheers,

Luke


On Tue, Jun 22, 2010 at 11:25 AM= , Luke Quinane <luke@= nuix.com> wrote:
Hi=A0Kristian,

I've = attached our Derby log file that has all the threads that were interrupted = when the problem occurred (quite a few).

Our application was in the middle of a user initiated q= uery when the user started another operation. For performance reasons we ca= ll shutdown() on the previous thread pool which causes the interrupt() call= on all the threads. As you can see in the stack traces some threads are qu= erying Lucene and some are querying Derby.

Mostly we just want to cancel the current query but fro= m the higher levels of the application its not apparent if a thread is proc= essing a Lucene or a Derby query.


Thanks again,

Luke


On Mon, Jun 21, 2010 at 6:52 PM, Kristian Waagan <K= ristian.Waagan@sun.com> wrote:
On 21.06.10 09:20, Luke= Quinane wrote:
Hi all,

We've seen a problem today where we have several threads querying our d= atabase and when one gets interrupted the others are stuck waiting for a lo= ck. Here is the stack trace for the stuck thread(s):

daemon prio=3D4
"DefaultExecutorService-pool-1-thread-47" Id=3D98 WAITING on org.= apache.derby.impl.services.locks.ActiveLock@6e6f45a1
at java.lang.Object.wait(Native Method)
- =A0waiting on org.apache.derby.impl.services.locks.ActiveLock@6e6f45a1 at java.lang.Object.wait(Object.java:485)
at org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(ActiveLock.= java:115)
at org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(Concur= rentLockSet.java:463)
at org.apache.derby.impl.services.locks.ConcurrentLockSet.zeroDurationLockO= bject(ConcurrentLockSet.java:855)
at org.apache.derby.impl.services.locks.AbstractPool.zeroDurationlockObject= (AbstractPool.java:297)
at org.apache.derby.impl.services.locks.ConcurrentPool.zeroDurationlockObje= ct(ConcurrentPool.java:28)
at org.apache.derby.impl.store.raw.xact.RowLocking2nohold.lockRecordForRead= (RowLocking2nohold.java:89)
at org.apache.derby.impl.store.access.heap.HeapController.lockRow(HeapContr= oller.java:520)
at org.apache.derby.impl.store.access.heap.HeapController.lockRow(HeapContr= oller.java:638)
at org.apache.derby.impl.store.access.btree.index.B2IRowLocking3.lockRowOnP= age(B2IRowLocking3.java:309)
at org.apache.derby.impl.store.access.btree.index.B2IRowLocking3._lockScanR= ow(B2IRowLocking3.java:599)
at org.apache.derby.impl.store.access.btree.index.B2IRowLockingRR.lockScanR= ow(B2IRowLockingRR.java:105)
at org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(BTre= eForwardScan.java:305)
at org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(BTreeS= can.java:1585)
at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.reloadArray(Bul= kTableScanResultSet.java:327)
at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.getNextRowCore(= BulkTableScanResultSet.java:282)
at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Bas= icNoPutResultSetImpl.java:460)
at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(EmbedResultSet.ja= va:427)
- =A0locked org.apache.derby.impl.jdbc.EmbedConnection40@445d374b
at org.apache.derby.impl.jdbc.EmbedResultSet.next(EmbedResultSet.java:371)<= br> =A0 =A0 =A0 =A0...

So is using Thread.interrupt() the correct way to interrupt a long running = query? Currently our worker threads are using 'java.util.concurrent.Thr= eadPoolExecutor' to run our queries and it calls 'interrupt()' = when it shuts down.

Hi Luke,

Unfortunately, Derby isn't currently able to cope correctly with gettin= g interrupted by Thread.interrupt(). Based on limited information, and as i= t stands today, I think your options are:
=A0a) code special shutdown logic for Derby
=A0b) do not invoke Thread.interrupt()
=A0c) force Derby to not use NIO (?)

This issue seems like something the Derby community has to address, as it i= s easily encountered using Java standard libraries (i.e. java.util.concurre= nt).
It is not clear to me why the lock system fails. Do you have a stack trace = from the thread being interrupted?

Also, can you describe your requirements for the process issuing Thread.int= errupt()?
I.e, do you want to shut down your application "hard", or just to= cancel a single running query?
You may find some helpful information in this thread (see especially the po= sts from Stephan and Knut): http://old.nabble.com/cancel-a= -running-query--td19903183.html


Regards,
--
Kristian



This was with Derby 10.5.3.0 and JRE 1.6.0_19-b04 on Windows Server 2008.

Thanks in advance,


Luke Quinane





--000e0cd723b0b142ab048998c766-- --000e0cd723b0b142b6048998c768 Content-Type: application/octet-stream; name="interrupted-exception-fix.patch" Content-Disposition: attachment; filename="interrupted-exception-fix.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_gaqdfpju1 SW5kZXg6IEM6L1dvcmtpbmcvdmVuZG9yL2RlcmJ5L2Rpc3QvamF2YS9lbmdpbmUvb3JnL2FwYWNo ZS9kZXJieS9pbXBsL3NlcnZpY2VzL2xvY2tzL0xvY2tTZXQuamF2YQ0KPT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQ0KLS0t IEM6L1dvcmtpbmcvdmVuZG9yL2RlcmJ5L2Rpc3QvamF2YS9lbmdpbmUvb3JnL2FwYWNoZS9kZXJi eS9pbXBsL3NlcnZpY2VzL2xvY2tzL0xvY2tTZXQuamF2YQkocmV2aXNpb24gOTM0OCkNCisrKyBD Oi9Xb3JraW5nL3ZlbmRvci9kZXJieS9kaXN0L2phdmEvZW5naW5lL29yZy9hcGFjaGUvZGVyYnkv aW1wbC9zZXJ2aWNlcy9sb2Nrcy9Mb2NrU2V0LmphdmEJKHJldmlzaW9uIDEyMDA4KQ0KQEAgLTI3 MiwxMyArMjcyLDE4IEBADQogICAgICAgICBsb25nIHN0YXJ0V2FpdFRpbWUgPSAwOw0KIA0KIGZv cmV2ZXI6CWZvciAoOzspIHsNCi0NCi0gICAgICAgICAgICAgICAgYnl0ZSB3YWtldXBSZWFzb24g PSB3YWl0aW5nTG9jay53YWl0Rm9yR3JhbnQoYWN0dWFsVGltZW91dCk7DQotICAgICAgICAgICAg ICAgIA0KKyAgICAgICAgICAgICAgICBieXRlIHdha2V1cFJlYXNvbiA9IDA7DQogICAgICAgICAg ICAgICAgIEFjdGl2ZUxvY2sgbmV4dFdhaXRpbmdMb2NrID0gbnVsbDsNCiAgICAgICAgICAgICAg ICAgT2JqZWN0W10gZGVhZGxvY2tEYXRhID0gbnVsbDsNCiANCiAgICAgICAgICAgICAgICAgdHJ5 IHsNCisgICAgICAgICAgICAgICAgICAgIHRyeSB7DQorICAgICAgICAgICAgICAgICAgICAgICAg d2FrZXVwUmVhc29uID0gd2FpdGluZ0xvY2sud2FpdEZvckdyYW50KGFjdHVhbFRpbWVvdXQpOw0K KyAgICAgICAgICAgICAgICAgICAgfSBjYXRjaChTdGFuZGFyZEV4Y2VwdGlvbiBlKSB7DQorICAg ICAgICAgICAgICAgICAgICAgICAgbmV4dFdhaXRpbmdMb2NrID0gY29udHJvbC5nZXROZXh0V2Fp dGVyKHdhaXRpbmdMb2NrLCB0cnVlLCB0aGlzKTsNCisgICAgICAgICAgICAgICAgICAgICAgICB0 aHJvdyBlOw0KKyAgICAgICAgICAgICAgICAgICAgfQ0KKw0KICAgICAgICAgICAgICAgICAgICAg Ym9vbGVhbiB3aWxsUXVpdFdhaXQ7DQogICAgICAgICAgICAgICAgICAgICBFbnVtZXJhdGlvbiB0 aW1lb3V0TG9ja1RhYmxlID0gbnVsbDsNCiAgICAgICAgICAgICAgICAgICAgIGxvbmcgY3VycmVu dFRpbWUgPSAwOw0KSW5kZXg6IEM6L1dvcmtpbmcvdmVuZG9yL2RlcmJ5L2Rpc3QvamF2YS9lbmdp bmUvb3JnL2FwYWNoZS9kZXJieS9pbXBsL3NlcnZpY2VzL2xvY2tzL0NvbmN1cnJlbnRMb2NrU2V0 LmphdmENCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT0NCi0tLSBDOi9Xb3JraW5nL3ZlbmRvci9kZXJieS9kaXN0L2phdmEv ZW5naW5lL29yZy9hcGFjaGUvZGVyYnkvaW1wbC9zZXJ2aWNlcy9sb2Nrcy9Db25jdXJyZW50TG9j a1NldC5qYXZhCShyZXZpc2lvbiA5MzQ4KQ0KKysrIEM6L1dvcmtpbmcvdmVuZG9yL2RlcmJ5L2Rp c3QvamF2YS9lbmdpbmUvb3JnL2FwYWNoZS9kZXJieS9pbXBsL3NlcnZpY2VzL2xvY2tzL0NvbmN1 cnJlbnRMb2NrU2V0LmphdmEJKHJldmlzaW9uIDEyMDA4KQ0KQEAgLTQ1OSwxMyArNDU5LDE4IEBA DQogICAgICAgICBsb25nIHN0YXJ0V2FpdFRpbWUgPSAwOw0KIA0KIGZvcmV2ZXI6CWZvciAoOzsp IHsNCi0NCi0gICAgICAgICAgICAgICAgYnl0ZSB3YWtldXBSZWFzb24gPSB3YWl0aW5nTG9jay53 YWl0Rm9yR3JhbnQoYWN0dWFsVGltZW91dCk7DQotICAgICAgICAgICAgICAgIA0KKyAgICAgICAg ICAgICAgICBieXRlIHdha2V1cFJlYXNvbiA9IDA7DQogICAgICAgICAgICAgICAgIEFjdGl2ZUxv Y2sgbmV4dFdhaXRpbmdMb2NrID0gbnVsbDsNCiAgICAgICAgICAgICAgICAgT2JqZWN0W10gZGVh ZGxvY2tEYXRhID0gbnVsbDsNCiANCiAgICAgICAgICAgICAgICAgdHJ5IHsNCisgICAgICAgICAg ICAgICAgICAgIHRyeSB7DQorICAgICAgICAgICAgICAgICAgICAgICAgd2FrZXVwUmVhc29uID0g d2FpdGluZ0xvY2sud2FpdEZvckdyYW50KGFjdHVhbFRpbWVvdXQpOw0KKyAgICAgICAgICAgICAg ICAgICAgfSBjYXRjaChTdGFuZGFyZEV4Y2VwdGlvbiBlKSB7DQorICAgICAgICAgICAgICAgICAg ICAgICAgbmV4dFdhaXRpbmdMb2NrID0gY29udHJvbC5nZXROZXh0V2FpdGVyKHdhaXRpbmdMb2Nr LCB0cnVlLCB0aGlzKTsNCisgICAgICAgICAgICAgICAgICAgICAgICB0aHJvdyBlOw0KKyAgICAg ICAgICAgICAgICAgICAgfQ0KKw0KICAgICAgICAgICAgICAgICAgICAgYm9vbGVhbiB3aWxsUXVp dFdhaXQ7DQogICAgICAgICAgICAgICAgICAgICBFbnVtZXJhdGlvbiB0aW1lb3V0TG9ja1RhYmxl ID0gbnVsbDsNCiAgICAgICAgICAgICAgICAgICAgIGxvbmcgY3VycmVudFRpbWUgPSAwOw0K --000e0cd723b0b142b6048998c768--