felix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pierre De Rop (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FELIX-3910) Race conditions in DependencyManager
Date Wed, 25 Sep 2013 11:12:06 GMT

    [ https://issues.apache.org/jira/browse/FELIX-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777358#comment-13777358
] 

Pierre De Rop commented on FELIX-3910:
--------------------------------------

I tried to follow your suggestion by reusing the same aspect locking mechanism for non aspect
callbacks.

The patch is not finished but I prefer to attach it now (see FELIX-3910.patch.4) so you can
take a look at.

patch description
=================

- if I understand correctly, the locking methods used by the aspect away methods allow to
serialize the execution of tasks, and other concurrent tasks are blocked until the currently
being executed task is done.
So, I have added the following method which I call from other methods needing to be serialized:

	private void executeExclusivelyOrBlock(Runnable task) {
		enqueueCallback(task);
		waitForCallbackLock(task);
		try {
			execute(task);
		} finally {
			releaseCallback(task);
		}
	}

- in the waitForCallbackLock, I did a minor optimization and replaced "indexOf" usage by this:

    private synchronized void waitForCallbackLock(Runnable runnable) {
    	while (m_injectionQueue.size() > 0 && m_injectionQueue.get(0) != runnable)
{
    		try {
				wait();
			} catch (InterruptedException e) {
			}
    	}
    }

I think this might improve performance, especially when there is a high number of dependency
services.
But perhaps, this optimization might be done later ...

- while doing the patch, I came across another synchronization hole that I did not notice
and fix in the previous FELIX-3910.patch.3 patch: Indeed, there is a subtle issue in the addedService(ServiceReference
ref, Object service) and removed(ServiceReference ref, Object service) methods, which I think
have to be entirely serialized (by reusing your guarded methods): The race condition concerns
the call to makeAvailable(), makeUnavailable() methods ...

So, in the FELIX-3910.patch.4: the addedService/removedService are serialized using the call
to executeExclusivelyOrBlock method. I also serialized the modifiedService, but I am not sure
for now if this is necessary:

    public void addedService(final ServiceReference ref, final Object service) {
    	executeExclusivelyOrBlock(new Runnable() {
			public void run() {
				addedServiceExclusively(ref, service);
			}
		});
    }
    
    private void addedServiceExclusively(ServiceReference ref, Object service) {
       // same code as in addedService, before the patch.
       ...
    }

    public void modifiedService(final ServiceReference ref, final Object service) {
    	executeExclusivelyOrBlock(new Runnable() {
			public void run() {
				modifiedServiceExclusively(ref, service);
			}
		});
    }
    
    private void modifiedServiceExclusively(ServiceReference ref, Object service) {
       // same code as in modifiedService, before the patch.
       ...
    }

    public void removedService(final ServiceReference ref, final Object service) {
    	executeExclusivelyOrBlock(new Runnable() {
			public void run() {
				removedServiceExclusively(ref, service);
			}
		});
    }
    
    private void removedServiceExclusively(ServiceReference ref, Object service) {
       // same code as in removedService, before the patch.
       ...
    }

With these patches, the "RaceTest" test seems to pass seamlessly. I ran it on a multi-core
machine with 32 cores.


Comments/Questions:
==================

1) since the three addedService/modifiedService/removedService are now serialized, I wonder
if the locking blocks are still necessary in the handleAspectAwareAdded methods ?
I think we shall remove the locking blocks from this method because we would then have a deadlock,
since the addedService/removedService are already holding locks (using the executeExclusivelyOrBlock
method).

Do you agree with this ? (I will provide FELIX-3910.patch.5 which does this, if you are OK).

2) in the handleAspectAwareAdded method, I may miss something but I wonder why the
tasks are executed while holding a lock on the "rankings" variable ?

		if (callbackRunnable != null) {
			waitForCallbackLock(callbackRunnable);
			synchronized (rankings) {
				releaseCallback(callbackRunnable);
				execute(callbackRunnable);
			}
		}

Could it be possible to have a deadlock if the callback blocks on a synchronization lock ?
In this case, then why not doing the following: 

		if (callbackRunnable != null) {
			waitForCallbackLock(callbackRunnable);
                        try {
		            execute(callbackRunnable);
			} finally {
	                    releaseCallback(callbackRunnable);
                        }			
		}
3) I'm currently wondering if the invokeAdded(DependencyService service) and invokeRemoved(DependencyService
service) methods shall (or not) also also serialized ?
I think this is not the case ?

4) I have to modify the "RaceTest" test in order to use aspects as well.

thanks again for your feedback; feel free to propose another patch if you think I'm not on
the right direction ...

/Pierre

                
> Race conditions in DependencyManager
> ------------------------------------
>
>                 Key: FELIX-3910
>                 URL: https://issues.apache.org/jira/browse/FELIX-3910
>             Project: Felix
>          Issue Type: Bug
>          Components: Dependency Manager
>    Affects Versions: dependencymanager-3.0.0
>         Environment: jdk1.6, jdk1.7, linux fc16
>            Reporter: Pierre De Rop
>         Attachments: FELIX-3910-patch, FELIX-3910.patch.2, FELIX-3910.patch.3, FELIX-3910.patch.4
>
>
> In a multi threaded context, where dependencies are injected concurrently from different
threads, we  came across some exceptions which seem to take place from dependencymanager.
> I have tried to reproduce the problems using a paxexam test which I will commit.
> Not all exceptions are reproduced by the test case, but I think that the testcase really
reproduces a problem. 
> I also have a candidate patch, which I will submit to this jira issue.
> Here are the exceptions we have seen:
> first stacktrace seen:
> ===============
> ERROR: Bundle test.dm [21] EventDispatcher: Error during dispatch. (java.lang.NullPointerException)
> java.lang.NullPointerException
>         at org.apache.felix.dm.impl.dependencies.ServiceDependencyImpl.addedService(ServiceDependencyImpl.java:481)
>         at org.apache.felix.dm.tracker.ServiceTracker$Tracked.customizerAdded(ServiceTracker.java:1325)
>         at org.apache.felix.dm.tracker.AbstractTracked.trackAdding(AbstractTracked.java:290)
>         at org.apache.felix.dm.tracker.AbstractTracked.track(AbstractTracked.java:236)
>         at org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChangedHideAspects(ServiceTracker.java:1206)
>         at org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1101)
>         at org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932)
>         at org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
>         at org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
>         at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
>         at org.apache.felix.framework.Felix.registerService(Felix.java:3275)
>         at org.apache.felix.framework.BundleContextImpl.registerService(BundleContextImpl.java:346)
>         at org.apache.felix.framework.BundleContextImpl.registerService(BundleContextImpl.java:320)
>         at test.dm.race.RaceTest$RegistrationHelper$1.run(RaceTest.java:104)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> second exceptions:
> ==============
> ERROR: EventDispatcher: Error during dispatch. (java.lang.NullPointerException)
> java.lang.NullPointerException
>         at org.apache.felix.dm.impl.ComponentImpl.invokeCallbackMethod(ComponentImpl.java:686)
>         at org.apache.felix.dm.impl.dependencies.ServiceDependencyImpl.invoke(ServiceDependencyImpl.java:704)
>         at org.apache.felix.dm.impl.dependencies.ServiceDependencyImpl.invokeRemoved(ServiceDependencyImpl.java:666)
>         at org.apache.felix.dm.impl.dependencies.ServiceDependencyImpl.removedService(ServiceDependencyImpl.java:520)
>         at org.apache.felix.dm.tracker.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1351)
>         at org.apache.felix.dm.tracker.AbstractTracked.untrack(AbstractTracked.java:359)
>         at org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChangedHideAspects(ServiceTracker.java:1285)
>         at org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1101)
>         at org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:878)
>         at org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:732)
>         at org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:662)
>         at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:3587)
>         at org.apache.felix.framework.Felix.access$000(Felix.java:40)
>         at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:625)
>         at org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:117)
>         at org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:128)
>         at org.apache.felix.dm.test.RaceTest$AFactory$2.run(RaceTest.java:151)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message