db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Øystein Grøvlen <Oystein.Grov...@Sun.COM>
Subject Re: DERBY-800, owned by Øystein Grøvlen, Is it time to move this test out of nightly regression suite?
Date Fri, 10 Feb 2006 06:33:28 GMT
Mike Matrigali wrote:
> I have seen this fail 2 or 3 times this week with various deadlocks, my
> assumption is that the problem is a test problem and that it needs to
> be changed to handle deadlocks.
> I haven't seen requests for extra info from new cases, so my assumption
> is that no new value is being gained by others running into this know
> JIRA issue.
> Is it time to move this out of the suite until a fix is submitted?

I am sorry that it has taken me some time to report back on this.  I 
have been much on the road lately (for different reasons), and I am 
trying to catch up now.  What I have done with this issue is that I have 
run the test with some tracing to see what caused the lock timeouts. I 
have not quite got to the bottom of it, but so far it seems to me that 
it is not a deadlock scenario, but just timeouts due to long queues on 
the dictionary lock.  (See below for more info).

Since creating 100 tables in parallel is not a common scenario, I am not 
sure whether it is worth the effort to attempt fix this so the test runs 
cleanly.  I was about to suggest take we should just remove the test 
from derbyall.  The test was made to test a fix (Derby-230) that I do 
not think is very likely to reoccur.  Unless someone protests, this is 
what I will do.

A more detailed description of what I have found:
When a thread tries to create a table, it will first get a shared lock 
on the dictionary (DataDictionaryImpl.startReading).  This is released 
before it tries to lock the dictionary exclusively.  The way 
DataDictionaryImpl.startwriting works is that it first checks whether 
someone is holding a lock on the dictionary. If so, it will sleep for a 
while a then try again.  This goes on for a while until it gets 
impatient and actually requests an exclusive lock and enters the lock 
queue.  In the mean time, a lot more threads have acquired a shared lock 
and the updating thread will have to wait for all of them to release it. 
   This causes the thread to time out.  I have not tried whether it 
would improve the issue if we did not allow readers to acquire locks 
while a writer is waiting, and I do not know what general consequences 
that may have.  However, since this does not seem to create problems for 
normal load, I doubt that it is worth the effort to do anything about it.


View raw message