river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patricia Shanahan <p...@acm.org>
Subject Re: TaskManager progress
Date Thu, 05 Aug 2010 21:41:41 GMT
addProxyReg is called from the LookupCacheImpl constructor and from the 
discover method of a DiscoveryListener attached to a 
LookupDiscoveryManager. I don't see any specific rules about thread 
choice for the discover call, but as far as I can tell the lookup 
discovery manager does not know anything about the service discovery 
manager's cache, and so is unlikely to use its TaskManager.

A LookupTask is created in the run method of a RegisterListenerTask, 
which is the class of task that addProxyReg adds to the cacheTaskMgr 
TaskManager. RegisterListenerTask inherits the CacheTask runAfter method 
that just returns false, so there is nothing to stop it from running in 
parallel with any older task.

Looks like two or more different threads to me. I think a simpler 
explanation for the lack of problem reports lies in two effects:

1. Almost all the time, addProxyReg will win the race. It has a 
significant running start. Immediately after the end of its synchronized 
block, it drops straight into calling the TaskManager add method. 
Meanwhile, another thread that was at the start of its synchronized 
block would have to go through creating a new Task object before 
attempting the add call. For addProxyReg to lose the race for the add 
method's TaskManager synchronization would require a cache miss or an 
interrupt in a window a few instructions long.

2. The symptom, if any, would be cache confusion that would be very hard 
to distinguish from a variation on 
https://issues.apache.org/jira/browse/RIVER-324. If there were any 
observations of this problem before RIVER-324 was fixed, they would have 
been conflated with it and presumed fixed.

I am a strong believer in closing even the tiniest timing windows, 
because they can collectively lead to general flakiness even if there 
are no reproducible bug reports.


On 8/5/2010 6:13 AM, Gregg Wonderly wrote:
> I haven't looked yet, but a quick thought I had was, are the other
> dependent instances (higher sequence numbers) actually created on a
> separate thread, so that ordering could be compromised, or are they just
> created in later in the same threads execution path?
> Gregg Wonderly
> Peter Firmstone wrote:
>> Thanks Patricia, sharp eyes!
>> Cheers,
>> Peter.
>> Patricia Shanahan wrote:
>>> ServiceDiscoveryManager.LookupCacheImpl.taskSeqN is used to
>>> initialize sequence number fields in some CacheTask subclasses that
>>> are then used in runAfter methods.
>>> ServiceDiscoveryManager.LookupCacheImpl.addProxyReg creates a task
>>> with incremented taskSeqN inside a serviceIdMap synchronized block,
>>> but adds it to the cacheTaskMgr outside the block.
>>> public void addProxyReg(ProxyReg reg) {
>>> RegisterListenerTask treg;
>>> synchronized(serviceIdMap) {
>>> treg = new RegisterListenerTask(reg, taskSeqN++);
>>> }//end sync(serviceIdMap)
>>> cacheTaskMgr.add(treg);
>>> }//end LookupCacheImpl.addProxyReg
>>> In the remaining sequence numbered CacheTask subclass cases, the
>>> cacheTaskMgr.add call is done inside the same
>>> synchronized(serviceIdMap) block as the taskSeqN increment.
>>> There is a window in addProxyReg during which a LookupTask or
>>> NotifyEventTask with a higher sequence number could be added before
>>> the RegisterListenerTask task addProxyReg is adding. The LookupTask
>>> and NotifyEventTask runAfter methods both wait for any
>>> RegisterEventTask with a lower sequence number.
>>> I believe this is a bug, but it will be hard to construct a test case
>>> because it involves such a narrow timing window. I do recommend
>>> moving the cacheTaskMgr.add(treg) statement inside the synchronized
>>> block.

View raw message