river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Firmstone <j...@zeus.net.au>
Subject Re: A new implementation of TaskManager
Date Thu, 08 Jul 2010 09:56:28 GMT
Thanks Patricia, your effort and conviction is much appreciated, River 
doesn't have bug's, it's got alligators, they're a little harder to 
squash, but good for a challenge.



Patricia Shanahan wrote:
> I need to study this. I'll comment when I know more.
> Patricia
> Peter Firmstone wrote:
>> Hi Patricia,
>> This is an example of some timing difficulties, a bug involving Task.
>> Perhaps Task can extend Remote?  Then we can pass them around as 
>> distributed objects, which will either be a local piece of proxy code 
>> executing or a stub.  That was one advantage of allowing Task to 
>> contain it's dependencies.  If it's sent elsewhere to other nodes, 
>> they can add it to their Task dependencies and the result can be 
>> retrieved remotely.  Perhaps with a getResult() method like 
>> RunnableFuture has.
>> Of course there are other ways, just passing on thoughts & knowledge, 
>> for problem solving.
>> There seems to be a GC & concurrency bug in DGC 
>> (DistributedGarbageCollection) reported on the list, I'll dig up the 
>> details and and create a JIRA issue for it.  It causes an exported 
>> object to be garbage collected before a stub can contact it, this is 
>> for a distributed object that isn't registered as a service.  That 
>> bug would cause problems for a Remote Task, if it were to be 
>> implemented among other things, it needs to be fixed.
>> Cheers,
>> Peter.
>> Bob Scheifler (JIRA) wrote:
>>>      [ 
>>> https://issues.apache.org/jira/browse/RIVER-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

>>> ]
>>> Bob Scheifler reopened RIVER-324:
>>> ---------------------------------
>>>       Assignee:     (was: Brian Murphy)
>>> Original fix had a nasty flaw.  Fix to fix has been attached.
>>>> Under certain circumstances, the ServiceDiscoveryManager internal 
>>>> LookupCache implementation can incorrectly  process attribute 
>>>> change events before the lookup  snapshot is processed.
>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>>>>                 Key: RIVER-324
>>>>                 URL: https://issues.apache.org/jira/browse/RIVER-324
>>>>             Project: River
>>>>          Issue Type: Bug
>>>>          Components: net_jini_lookup
>>>>    Affects Versions: AR1
>>>>            Reporter: Brian Murphy
>>>>            Priority: Minor
>>>>             Fix For: AR2
>>>>         Attachments: river-324-2.diff, river-324.patch
>>>> When an attribute change event is received from the
>>>> lookup service between the time the cache registers
>>>> the event listener and the initial LookupTask takes
>>>> the snapshot of the associated service state, the change event can 
>>>> get processed first, which can result in incorrect attribute state.
>>>> This bug has been observed in a currently deployed
>>>> system, generally at startup when the services of
>>>> the system are changing their attributes from an
>>>> initial, 'unknown' state, to a discovered state that is shared 
>>>> among those services. What has been
>>>> observed is a sequence like the following:
>>>> 1. event registration is sent to the lookup service
>>>> 2. snapshot is requested (LookupTask is queued)
>>>> 3. the lookup service sends back in the requested
>>>>    snapshot, the initial state the service registered
>>>>    for itself
>>>> 4. the service sends an attribute modification    request to the 
>>>> lookup service, which sends an
>>>>    attribute change event to the cache
>>>> 5. before the cache's LookupTask processes the    snapshot from the 
>>>> lookup service, the event
>>>>    arrives and the event processing thread of the
>>>>    cache processes the event containing the latest
>>>>    state of the service's attributes.
>>>> 6. the cache then processes the snapshot, replacing
>>>>    the latest, most up-to-date attribute state with
>>>>    the original, initial state reflected in the
>>>>    snapshot.
>>>> 7. the cache now has an incorrect view of the
>>>>    service's state.
>>>> Bob Scheifler has implemented a simple fix; which
>>>> is (quoting Bob), "to have the LookupTask execute the tasks it 
>>>> creates directly, rather than queueing
>>>> them." That is, force any pending snapshot processing
>>>> tasks to be executed before the event processing
>>>> tasks.
>>>> Note that with the proposed fix, if more than
>>>> one lookup service is running, it is possible for
>>>> an attribute to "regress" as the lookup services
>>>> do not receive a given attribute change at exactly
>>>> the same time, but the inconsistency will eventually correct itself 
>>>> as the cache receives each attribute
>>>> change event, and so should not be a permanent condition.

View raw message