db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <derby-...@db.apache.org>
Subject [jira] Commented: (DERBY-989) unit/daemonService.unit fails intermittently: 'ran out of time'
Date Tue, 04 Jul 2006 18:28:31 GMT
    [ http://issues.apache.org/jira/browse/DERBY-989?page=comments#action_12419163 ] 

Knut Anders Hatlen commented on DERBY-989:

Thank you Andreas and Bryan for reviewing the patch.

Bryan wrote:

> 1) I don't really see why unsubscribe needs to block if it discovers
> that the service record being unsubscribed is currently performing
> work. Is this a necessary element of the fix, or just a side effect
> of the implementation?

It is a necessary element of the fix. The test case that fails does
the following (everything not related to the failure stripped away):

  1 subscribe client
  2 unsubscribe client
  3 save the number of times the client has been serviced
  4 do some other testing
  5 check that the number of times the client has been serviced has
    not increased

If unsubscribe() returns while the client is being serviced, parts of
performWork() might be executed after 3 and before 5. Since
performWork() is updating the number of times the client has been
serviced, the check in 5 fails. The only way we can guarantee that
performWork() is not running after 3 is by waiting in unsubscribe().

However, my fix assumes that the test is correct. I don't see anything
in the code/comments saying that performWork() should never be run
after unsubscribe() has returned. Actually, the javadoc in BasicDaemon

  Furthermore, any Serviceable subscriptions, including onDemandOnly,
  must tolerate spurrious services.

This could mean that the current behaviour is as expected/designed and
the test should be changed. It should be as easy as changing point 5
to "check that the number of times the client has been serviced has
not increased by more than one".

Does anyone have opinions on whether we should fix the Derby code or
the test? I'm fine with either one of them.

> 2) I think that it is dangerous for unsubscribe to ignore interrupts
> while waiting for the service record to complete its work. I think
> that if unsubscribe gets an interrupt at this time, it should
> abandon the wait and return, or it should throw an exception.

I agree. We can't throw an exception because it would break the
contract of the DaemonService interface, but I think it's reasonable
to abandon the wait and return.

> 3) Introducing the "client number" concept into the service record
> seems awkward, as it doesn't seem like a natural part of the service
> record implementation. Also, comparing client numbers as a way to
> answer the question "is this service record already unsubscribed"
> seems somewhat convoluted. Here are two alternate ideas that
> occurred to me:
>  - perhaps we could refer to the new Service Record field as a
>  "unique id", and have an "equals" method to compare two
>  ServiceRecord objects, so that we don't so directly couple the
>  Client Number concept from Basic Daemon into the ServiceRecord
>  class.

I don't think we need to have decoupling of ServiceRecord from
BasicDaemon as a goal. ServiceRecord is a package protected class and
its only purpose is to aid the implementation of BasicDaemon. It is
purely an implementation detail of BasicDaemon, and it is not related
to the DaemonService interface in any way.

>  - perhaps there should be an "subscribed" boolean flag in Service
>  Record, with methods "isSubscribed" and "setSubscribed" so that
>  when Basic Daemon wants to ask if a ServiceRecord is already
>  unsubscribed, it could do so directly rather then by fetching the
>  client number and testing it against -1 and so forth.

I agree that this sounds cleaner. Thanks for the tip!

> 4) It seems a little odd that BasicDaemon sometimes synchronizes on
> the "this" object, and sometimes depends on the fact that
> "subscription" is a Vector and thus is inherently synchronized. I'm
> not saying that anything is necessarily wrong here, it was just a
> red flag to me that there were two levels of synchronization being
> used in the BasicDaemon methods.

I'm not a fan of this either. However, in this case it seems like the
vector synchronization and the "this" synchronization guard different
parts of the state. Synchronization on "this" ensures that the state
controlling the wait() loops is consistent, and the vector
synchronization ensures that concurrent modifications of the vector
don't interfere with each other. There is probably a cleaner way to
implement BasicDaemon, but that would be outside the scope of this
patch, I think.

Thanks for all your comments! I will wait for comments on my question
about whether this is a test issue or a code issue before submitting a
new patch.

> unit/daemonService.unit fails intermittently: 'ran out of time'
> ---------------------------------------------------------------
>          Key: DERBY-989
>          URL: http://issues.apache.org/jira/browse/DERBY-989
>      Project: Derby
>         Type: Test

>   Components: Regression Test Failure
>     Versions:
>  Environment: OS: Solaris 10 3/05 s10_74L2a X86 - SunOS 5.10 Generic, JVM: Sun Microsystems
Inc. 1.5.0_04
> OS: Solaris 9 9/04 s9s_u7wos_09 SPARC - SunOS 5.9 Generic_118558-11, JVM: Sun Microsystems
Inc. 1.5.0_03
> OS: Red Hat Enterprise Linux AS release 3 (Taroon Update 4) - Linux 2.4.21-27.ELsmp #1
SMP Wed Dec 1 21:50:31 EST 2004 GNU/Linux, JVM: Sun Microsystems Inc. 1.5.0_03
>     Reporter: Ole Solberg
>     Assignee: Knut Anders Hatlen
>     Priority: Minor
>  Attachments: 989-411220-derbyall_derbyall_unit_unit_derby.log, derby-989-timebomb.diff,
derby-989-timebomb.stat, derby-989-v1.diff, derby-989-v1.stat
> "Signature":
> ********* Diff file unit/unit/daemonService.diff
> *** Start: daemonService jdk1.5.0_04 unit:unit 2006-02-14 20:46:42 ***
> 2 del
> < -- Unit Test T_DaemonService finished
> 2 add
> > ran out of time
> > Shutting down due to unit test failure.
> > Exit due to time bomb
> Test Failed.
> *** End:   daemonService jdk1.5.0_04 unit:unit 2006-02-14 21:47:13 ***
> http://www.multinet.no/~solberg/public/Apache/Derby/Limited/testSummary-377800.html [SunOS-5.10

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message