db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <derby-...@db.apache.org>
Subject [jira] Updated: (DERBY-1326) Network server may abandon sessions when Derby system is shutdown and this causes intermittent hangs in the client
Date Sun, 11 Jun 2006 17:49:31 GMT
     [ http://issues.apache.org/jira/browse/DERBY-1326?page=all ]

Bryan Pendleton updated DERBY-1326:

    Attachment: unify_NSImpl_instances.diff

Attached is 'unify_NSImpl_instances.diff', with these notes to explain. This
patch belongs with the rest of the DERBY-1326 changes, but I've extracted it
out into a standalone patch to make it easier to study and review.

Recently, I've been investigating that hang that Deepa uncovered which occurs
with the latest patch proposal for DERBY-1326.

Investigating that problem has taken me through some interesting analysis
of NetworkServer startup/shutdown issues, which I'd like to describe here, to
see what sort of reaction I get from the team.

There are several different ways to start up the Network Server, but after
some twists and turns there appear to be two essential techniques:
1) You can instantiate a NetworkServerControl object, and call the start()
   method on it.
2) You can define the derby.drda.startNetworkServer property to true, and then
   start the embedded engine.

These two code paths proceed somewhat differently, but end up getting to the
same place:
1) NetworkServerControl.start() delegates to NetworkServerControlImpl.start(),
   which calls NetworkServerControlImpl.startNetworkServer(), then instantiates
   a DRDAServerStarter object and calls its boot() method.
2) Loading the embedded engine eventually calls JDBCBoot.boot(), which notices
   that derby.drda.startNetworkServer has been set, and calls
   Monitor.startSystemModule() to instantiate a DRDAServerStarter object and
   call its boot() method.

The DRDAServerStarter.boot() method uses reflection to:
 - dynamically load the Network Server code
 - create an instance of NetworkServerControlImpl
 - create a daemon server thread, which will then:
 - call NetworkServerControlImpl.blockingStart()

So whichever way that we start the Network Server, we end up constructing
a NetworkServerControlImpl instance and calling blockingStart() on that
instance from a background thread. This is good.

However, there is an important and interesting difference between the two
startup methods:

   DRDAServerStarter.boot() always creates its own instance of the
   NetworkServerControlImpl object and calls blockingStart() on that instance.
   So if you start the Network Server by instantiating a NetworkServerControl
   object in your own code, that NetworkServerControl instance creates a
   NetworkServerControlImpl instance, but then when it calls DRDAServerStarter,
   we end up creating a *second* NetworkServerControlImpl instance to use as
   the master instance, and we never really use the NetworkServerControlImpl
   instance that was used by the NetworkServerControl object that you created.

Worse, *both* NetworkServerControlImpl instances are started up, in a certain
sense, because NetworkServerControlImpl.startNetworkServer() is called on
each instance:
 - NetworkServerControlImpl.start() calls startNetworkServer() on itself before
   it instantiates the DRDAServerStarter object and calls boot()
 - NetworkServerControlImpl.blockingStart() calls startNetworkServer() on
   itself as its first action.

So if you start up the Network Server by calling NetworkServerControl.start(),
you end up actually creating *two* Network Server instances, although one of
them doesn't really do very much. This is the source of the hang that I created
with the most recent patch proposal to DERBY-1326: when I changed the
startNetworkServer() method to create a new thread, that code ends up being
called twice, but we don't shut down both of these instances, so we never
shut down the extra thread that I created, and that caused the process to
hang and not terminate.

Clearly, one way out of this problem is to re-arrange the thread creation logic
so that the thread is only created when the *real* NetworkServerControlImpl
instance is started.

But I was curious about why there were two NetworkServerControlImpl instances,
since it seems quite clear to me that the NetworkServerControlImpl class is
designed to behave as a singleton object. I thought that if I had made this
assumption that there would only ever be a single instance in a Java app,
others might tend to make this same assumption in the future, and more bugs
like the one I accidentally introduced might get introduced in the future.

So, for the last week or so, I've been studying this code, trying to figure
out if we really need to have two NetworkServerControlImpl instances, or
whether it would be cleaner to just have a single instance.

My conclusion is that, for the most part, we *don't* need to have both
instances, and it *is* cleaner to just have a single instance, but there
are a few messy details.

The changes that I've been experimenting with in this area are as follows:
1) Refactor the DRDAServerStarter code slightly so that the caller can
   optionally pass in the NetworkServerControlImpl instance to use. If the
   caller does not pass in an instance, DRDAServerStarter creates one via
   reflection, as it does now.
2) Simplify NetworkServerControlImpl.start() so that it no longer calls
   startNetworkServer() on itself, but instead simply calls DRDAServerStarter,
   passing the "this" pointer in as the instance to use for starting the server.
3) Modify the DRDAServerStarter boot code so that it loads the embedded
   engine. The current DRDAServerStarter code assumes that the embedded engine
   has already been started; I believe that this is the main reason for the
   current code's call to startNetworkServer() from NetworkServerControlImpl's
   start() method, as a side effect of startNetworkServer() is to load the
   embedded engine. It seems simple to have DRDAServerStarter load the
   embedded engine itself, but an alternative would be to load the embedded
   engine in NetworkServerControlImpl.start().

With these changes in place, the Network Server startup and shutdown code
seems substantially cleaner to me, and the tests seem to run very well.
I'm pleased with the results.

However, there is one messy detail that remains to be resolved, and it
involves the Network Server shutdown processing. The shutdown() method for
the Network Server, in NetworkServerControlImpl.shutdown(), uses the following
 - tell the server to shut down
 - loop for a while, ping'ing the Network server, until we get an error

Unfortunately, during this "ping loop", the code wants to ensure that no
stray messages from the failed ping operation escape to the user, so the
code intentionally disables the Network Server console.

When there were two NetworkServerControlImpl instances in play, intentionally
disabling the console of the one instance was a relatively safe thing to do.

However, when there is only a single instance in the process, disabling the
console is a quite disruptive action, because it affects *all* the messages
that the Network Server might want to print, including the "normal shutdown"
message that the Network Server prints at the conclusion of shutdown.

Therefore, I'm coming to believe that we shouldn't disable the console
during shutdown, but rather should implement a variant "ping" operation which
doesn't print any messages when it fails, but instead just throws an exception,
which the shutdown code could catch and handle. Unfortunately, this is not
a trivial one-line change, as the code which prints error messages to the
Network Server console is buried *deep* in the Network Server command
processing code.

Obviously, I've still got more work to do here. But I'm attaching the current
state of my changes anyway, since I think it's a useful intermediate stage
and would be interesting to reviewers.

The other comment to be made is that this change is now becoming quite large,
and probably should be split into multiple smaller patches. I'll pursue that
strategy later, once I get to a point where I have code that seems solid.

Thanks in advance for all comments, suggestions, and other feedback!

> Network server may abandon sessions when Derby system is shutdown and this causes intermittent
hangs in the client
> ------------------------------------------------------------------------------------------------------------------
>          Key: DERBY-1326
>          URL: http://issues.apache.org/jira/browse/DERBY-1326
>      Project: Derby
>         Type: Bug

>   Components: Network Server
>     Reporter: Deepa Remesh
>     Assignee: Bryan Pendleton
>  Attachments: repro1326.java, resolve_DRDConnThread_conflict.diff, sessionMgmt1.diff,
sessionMgmt1_and_nosessionsforclosedthreads.diff, unify_NSImpl_instances.diff, withNewThreadAndNoShutdownOfCurrentSession.diff
> This issue was found when working on DERBY-1219. More details can be found in the comments
at http://issues.apache.org/jira/browse/DERBY-1219

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message