I attached a patch to GERONIMO-4918 that mostly reverses the dependencies between EjbModuleImpl and EjbDeployment so the deployments will start first.  I think this is the main bit needed for David's idea on how to fix this without the wait() code in the geronimo wiring.  There are some other changes needed before this will work such as removing the lifecycle methods from EjbDeployment.  It also annotates EjbModuleImpl and adds wildcard support to collection valued references, I may well commit this last bit separately.

thanks
david jencks

On Oct 23, 2009, at 5:33 PM, David Blevins wrote:


On Oct 22, 2009, at 1:27 PM, Quintin Beukes wrote:

Unfortunately the only way I could see how to get the @Startup working
was to modify OpenEJB to create a property which gives the
responsibility over to Geronimo, and there the only way was to create
a new GBean which has it's lifecycle doStart() called after all EJBs
are in the RUNNING state.

I couldn't find a better way.

Very impressive that you could find the problem at all as well as a workable fix.  That code makes my brain hurt nearly every time I look at it.

I checked in the more generic Singleton code.  Left out the alternate startup code -- though it was actually pretty clever.

This chunk of code in GeronimoThreadContextListener was not there when we wrote the initial integration, so I went digging around as to why it was added (there shouldn't be any locking code in the integration at all, much less wait/notify):

At the 10,000 foot view, this chunk of code in GeronimoThreadContextListener must die:

   synchronized (deploymentInfo) {
               if (deploymentInfo.get(EjbDeployment.class) == null) {
                   if (!deploymentInfo.isDestroyed()) {
                       try {
                           deploymentInfo.wait();
                       } catch (InterruptedException e) {
                       log.warn("Wait on deploymentInfo interrupted unexpectedly");
                       }
                   }
               }
           }

Seems that code was added for GERONIMO-3780 which is essentially the same as the Singleton injection/lookup issue.  Both issues boil down to the fact that lookups may happen inside the EjbModuleImpl.start() method where ejbs are created by OpenEJB.  The wait/notify block works for MDBs as all their lookups will happen in other threads and not inside the startup thread.  With Singletons this is not the case, so the startup thread ends up waiting on its own thread and a deadlock occurs.

Ultimately, this is flawed and the data that is required in GeronimoThreadContextListener needs to be made available in some way before we call EjbModuleImpl.start().  I talked it over with David Jencks and the EjbDeployment objects are available when the EjbModuleImpl gbean is start.  Then we should be able to hand them directly to the GeronimoThreadContextLister *before* asking OpenEJB to create the EJBs (and subsequently do lookups).  When the contextEntered method is called we can complete any initialization as at that point we will have the CoreDeploymentInfo object and can hook it up with the EjbDeployment object.

So we should be able to get a solution in there that removes all locking code, works for singletons and mdbs, and doesn't require a new OpenEJB release.  Finger's crossed anyway :)

Thanks for all the work bringing it this far.  Really you did all the hard work.  Very *very* appreciated.

-David




















[1]  http://issues.apache.org/jira/browse/GERONIMO-3780