felix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Jencks <david_jen...@yahoo.com>
Subject Re: SCR concurrency issues (cf FELIX-3456)
Date Fri, 27 Apr 2012 16:47:02 GMT

On Apr 27, 2012, at 4:23 AM, Felix Meschberger wrote:

> Hi,
> 
> Am 20.04.2012 um 00:19 schrieb David Jencks:
> 
>> We've run into one definite concurrency problem in SCR and I've been discussing offline
with a colleague how to fix it and wanted to get the discussion out in the open.
>> 
>> The original symptom was when 2 mandatory service refs were satisfied on different
threads at once: the 2nd wasn't recognized so the component never got activated.
>> 
>> This is easily solved by synchronizing but this introduces risk of deadlocks (my
first attempt, https://issues.apache.org/jira/secure/attachment/12522537/FELIX-3456-1.diff)
> 
> Yes
> 
>> 
>> We tried some partly asynchronous approaches such as https://issues.apache.org/jira/secure/attachment/12523313/FELIX-3456-4.diff.
 Unless there's a timeout (presumably due to deadlock) this gets all service events processed
before the thread exits from its first call into SCR.  However this can result in service
events getting processed later than one expects possibly on a different thread.  On further
thought we concluded that a service event must be processed fully before the service registration
call returns.  We therefore don't think any kind of asynchronous approach will work.
> 
> Yes. For activation it might cause SCR to not terminate processing before the synchronous
bundle event handling ends. More importantly, though, unbinding services must be handled synchronously
to prevent errors in the components caused by SCR calling the unbind methods when the bound
service object is already invalid.
> 
> 
>> 
>> We've discovered the anti-circular-dependency clause in the spec (112.3.5) but it
appears to be overly biased towards SCR-only graphs of services.  We are leaning towards thinking
that SCR also needs to consider:
>> 
>> - an activate method registers a service that satisfies an optional dependency of
a component being activated by scr on the same thread.
>> - the same, except the activate method starts a new thread to register the service
and waits for it to complete.
>> 
> 
> You can come up with lots of scnearios here. Thing is always, that an event may happen
for the component to be processed while its state is changing. This is particularly problematic
during activation and deactivation (due to missing dependencies).
> 
>> Another scenario to consider is
>> 
>> components C1 and C2 registering as services, each with an optional dynamic dependency
on the other.  If one starts, and then the other, there is no problem, they both get references
to the other.  If they both start at the same time in separate threads (either because they
are in different bundles or because they get activated due to mandatory references being satisfied)
and register the services while the other is in the Activating state, a simple lock over the
service event processing will result in deadlock.  Furthermore, to get the correct result,
at least one of the services has to be bound while the component to which is is binding is
in the Activating state.
> 
> Dynamic binding of optional services is not a big issue. Because this is known to happen
at any time and because such events are fully processed calling the bind and unbind methods
even during activation.
> 
>> 
>> It looks like the situation can be simplified a bit by considering, for service events,
whether the dependency will result in a state change: if it's optional or mandatory but not
the only satisfying service, it won't, but if it's mandatory and the first satisfying service,
it will.  We can calculate this before calling any bind methods or activate methods.  After
determining this, we know the final state of the component.
> 
> SCR already does this but it only considers the impact of the single reference. It does
not take any other references into account.
> 
>> 
>> We're considering whether some kind of 2-stage lock would work:
>> 
>> one level can change the state and blocks all other threads
>> the other level can't change the state and lets stuff like service events for non-state-changing
service references be processed according to the final state of the component. (e.g. activating
will let bind methods be called on the under-configuration object).
>> 
>> This does not yet consider bundle event driven state changes or deactivation or delayed
component creation or service factories.
>> 
>> Comments and more scenarios to consider are more than welcome.
> 
> I would rather come back to a proposal I already made on the bug:
> 
> If a service or configuration event takes place while the component is in the transient
activating state, the event is placed into a special queue for further processing. When the
transient state is existing, the queue is checked for further actions to take place.
> 
> There is only a small number of situations:
> 
>   * Service added: This must be handled
>   * Service removed: Might deactivate the component immediately.
>   * Config update or delete: Might deactivate the component
> 
> The problem here is the removal of a service while the component is being activated.
When we queue this event and handle it later the service has already gone and will be in an
undefined/unusable state causing problems. But there is probably not much we can do about
this beause the component might be in the activate method and synchronizing at this point
in time is risking deadlocks.
> 
> Thus, I think the queue for post processing while in activating state sounds like the
most sensible thing to do (with some small remaning window for things going wrong). This is
as easy as implementing the deactivate and activate methods in the Activating state to enqeue
these requests.

I think this is what's implemented in https://issues.apache.org/jira/secure/attachment/12523061/FELIX-3456-3.diff.
 I just don't think it works.  Either you return from some events without having done the
work promised or you do it in a different order than expected.  Either way you're better off
with locks and possible timeouts and exceptions.

still thinking...
david jencks

> 
> Regards
> Felix


Mime
View raw message