river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Firmstone <j...@zeus.net.au>
Subject Re: A new implementation of TaskManager
Date Thu, 08 Jul 2010 02:16:52 GMT
Hi Patricia,

This is an example of some timing difficulties, a bug involving Task.

Perhaps Task can extend Remote?  Then we can pass them around as 
distributed objects, which will either be a local piece of proxy code 
executing or a stub.  That was one advantage of allowing Task to contain 
it's dependencies.  If it's sent elsewhere to other nodes, they can add 
it to their Task dependencies and the result can be retrieved remotely.  
Perhaps with a getResult() method like RunnableFuture has.

Of course there are other ways, just passing on thoughts & knowledge, 
for problem solving.

There seems to be a GC & concurrency bug in DGC 
(DistributedGarbageCollection) reported on the list, I'll dig up the 
details and and create a JIRA issue for it.  It causes an exported 
object to be garbage collected before a stub can contact it, this is for 
a distributed object that isn't registered as a service.  That bug would 
cause problems for a Remote Task, if it were to be implemented among 
other things, it needs to be fixed.



Bob Scheifler (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/RIVER-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> Bob Scheifler reopened RIVER-324:
> ---------------------------------
>       Assignee:     (was: Brian Murphy)
> Original fix had a nasty flaw.  Fix to fix has been attached.
>> Under certain circumstances, the ServiceDiscoveryManager internal LookupCache implementation
can incorrectly  process attribute change events before the lookup  snapshot is processed.
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>                 Key: RIVER-324
>>                 URL: https://issues.apache.org/jira/browse/RIVER-324
>>             Project: River
>>          Issue Type: Bug
>>          Components: net_jini_lookup
>>    Affects Versions: AR1
>>            Reporter: Brian Murphy
>>            Priority: Minor
>>             Fix For: AR2
>>         Attachments: river-324-2.diff, river-324.patch
>> When an attribute change event is received from the
>> lookup service between the time the cache registers
>> the event listener and the initial LookupTask takes
>> the snapshot of the associated service state, the 
>> change event can get processed first, which can 
>> result in incorrect attribute state.
>> This bug has been observed in a currently deployed
>> system, generally at startup when the services of
>> the system are changing their attributes from an
>> initial, 'unknown' state, to a discovered state 
>> that is shared among those services. What has been
>> observed is a sequence like the following:
>> 1. event registration is sent to the lookup service
>> 2. snapshot is requested (LookupTask is queued)
>> 3. the lookup service sends back in the requested
>>    snapshot, the initial state the service registered
>>    for itself
>> 4. the service sends an attribute modification 
>>    request to the lookup service, which sends an
>>    attribute change event to the cache
>> 5. before the cache's LookupTask processes the 
>>    snapshot from the lookup service, the event
>>    arrives and the event processing thread of the
>>    cache processes the event containing the latest
>>    state of the service's attributes.
>> 6. the cache then processes the snapshot, replacing
>>    the latest, most up-to-date attribute state with
>>    the original, initial state reflected in the
>>    snapshot.
>> 7. the cache now has an incorrect view of the
>>    service's state.
>> Bob Scheifler has implemented a simple fix; which
>> is (quoting Bob), "to have the LookupTask execute 
>> the tasks it creates directly, rather than queueing
>> them." That is, force any pending snapshot processing
>> tasks to be executed before the event processing
>> tasks.
>> Note that with the proposed fix, if more than
>> one lookup service is running, it is possible for
>> an attribute to "regress" as the lookup services
>> do not receive a given attribute change at exactly
>> the same time, but the inconsistency will eventually 
>> correct itself as the cache receives each attribute
>> change event, and so should not be a permanent 
>> condition.

View raw message