river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patricia Shanahan <p...@acm.org>
Subject Re: A new implementation of TaskManager
Date Thu, 08 Jul 2010 05:16:06 GMT
I need to study this. I'll comment when I know more.


Peter Firmstone wrote:
> Hi Patricia,
> This is an example of some timing difficulties, a bug involving Task.
> Perhaps Task can extend Remote?  Then we can pass them around as 
> distributed objects, which will either be a local piece of proxy code 
> executing or a stub.  That was one advantage of allowing Task to contain 
> it's dependencies.  If it's sent elsewhere to other nodes, they can add 
> it to their Task dependencies and the result can be retrieved remotely.  
> Perhaps with a getResult() method like RunnableFuture has.
> Of course there are other ways, just passing on thoughts & knowledge, 
> for problem solving.
> There seems to be a GC & concurrency bug in DGC 
> (DistributedGarbageCollection) reported on the list, I'll dig up the 
> details and and create a JIRA issue for it.  It causes an exported 
> object to be garbage collected before a stub can contact it, this is for 
> a distributed object that isn't registered as a service.  That bug would 
> cause problems for a Remote Task, if it were to be implemented among 
> other things, it needs to be fixed.
> Cheers,
> Peter.
> Bob Scheifler (JIRA) wrote:
>>      [ 
>> https://issues.apache.org/jira/browse/RIVER-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

>> ]
>> Bob Scheifler reopened RIVER-324:
>> ---------------------------------
>>       Assignee:     (was: Brian Murphy)
>> Original fix had a nasty flaw.  Fix to fix has been attached.
>>> Under certain circumstances, the ServiceDiscoveryManager internal 
>>> LookupCache implementation can incorrectly  process attribute change 
>>> events before the lookup  snapshot is processed.
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>>>                 Key: RIVER-324
>>>                 URL: https://issues.apache.org/jira/browse/RIVER-324
>>>             Project: River
>>>          Issue Type: Bug
>>>          Components: net_jini_lookup
>>>    Affects Versions: AR1
>>>            Reporter: Brian Murphy
>>>            Priority: Minor
>>>             Fix For: AR2
>>>         Attachments: river-324-2.diff, river-324.patch
>>> When an attribute change event is received from the
>>> lookup service between the time the cache registers
>>> the event listener and the initial LookupTask takes
>>> the snapshot of the associated service state, the change event can 
>>> get processed first, which can result in incorrect attribute state.
>>> This bug has been observed in a currently deployed
>>> system, generally at startup when the services of
>>> the system are changing their attributes from an
>>> initial, 'unknown' state, to a discovered state that is shared among 
>>> those services. What has been
>>> observed is a sequence like the following:
>>> 1. event registration is sent to the lookup service
>>> 2. snapshot is requested (LookupTask is queued)
>>> 3. the lookup service sends back in the requested
>>>    snapshot, the initial state the service registered
>>>    for itself
>>> 4. the service sends an attribute modification    request to the 
>>> lookup service, which sends an
>>>    attribute change event to the cache
>>> 5. before the cache's LookupTask processes the    snapshot from the 
>>> lookup service, the event
>>>    arrives and the event processing thread of the
>>>    cache processes the event containing the latest
>>>    state of the service's attributes.
>>> 6. the cache then processes the snapshot, replacing
>>>    the latest, most up-to-date attribute state with
>>>    the original, initial state reflected in the
>>>    snapshot.
>>> 7. the cache now has an incorrect view of the
>>>    service's state.
>>> Bob Scheifler has implemented a simple fix; which
>>> is (quoting Bob), "to have the LookupTask execute the tasks it 
>>> creates directly, rather than queueing
>>> them." That is, force any pending snapshot processing
>>> tasks to be executed before the event processing
>>> tasks.
>>> Note that with the proposed fix, if more than
>>> one lookup service is running, it is possible for
>>> an attribute to "regress" as the lookup services
>>> do not receive a given attribute change at exactly
>>> the same time, but the inconsistency will eventually correct itself 
>>> as the cache receives each attribute
>>> change event, and so should not be a permanent condition.

View raw message