river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Firmstone <j...@zeus.net.au>
Subject Re: A new implementation of TaskManager
Date Thu, 08 Jul 2010 11:29:02 GMT

I've created an issue on Jira where you can upload patches, I'll commit 
your patches for you.




Peter Firmstone wrote:
> Actually, it might be easier to just treat that as a separate issue 
> for now, which it mostly is and do what you had planned with 
> TaskManger, tackle that one later if you want, when you've had some 
> more time to digest the codebase.  My apologies, the last thing I want 
> to do is scare you off.
> Peter.
> Peter Firmstone wrote:
>> Thanks Patricia, your effort and conviction is much appreciated, 
>> River doesn't have bug's, it's got alligators, they're a little 
>> harder to squash, but good for a challenge.
>> Cheers,
>> Peter.
>> Patricia Shanahan wrote:
>>> I need to study this. I'll comment when I know more.
>>> Patricia
>>> Peter Firmstone wrote:
>>>> Hi Patricia,
>>>> This is an example of some timing difficulties, a bug involving Task.
>>>> Perhaps Task can extend Remote?  Then we can pass them around as 
>>>> distributed objects, which will either be a local piece of proxy 
>>>> code executing or a stub.  That was one advantage of allowing Task 
>>>> to contain it's dependencies.  If it's sent elsewhere to other 
>>>> nodes, they can add it to their Task dependencies and the result 
>>>> can be retrieved remotely.  Perhaps with a getResult() method like 
>>>> RunnableFuture has.
>>>> Of course there are other ways, just passing on thoughts & 
>>>> knowledge, for problem solving.
>>>> There seems to be a GC & concurrency bug in DGC 
>>>> (DistributedGarbageCollection) reported on the list, I'll dig up 
>>>> the details and and create a JIRA issue for it.  It causes an 
>>>> exported object to be garbage collected before a stub can contact 
>>>> it, this is for a distributed object that isn't registered as a 
>>>> service.  That bug would cause problems for a Remote Task, if it 
>>>> were to be implemented among other things, it needs to be fixed.
>>>> Cheers,
>>>> Peter.
>>>> Bob Scheifler (JIRA) wrote:
>>>>>      [ 
>>>>> https://issues.apache.org/jira/browse/RIVER-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

>>>>> ]
>>>>> Bob Scheifler reopened RIVER-324:
>>>>> ---------------------------------
>>>>>       Assignee:     (was: Brian Murphy)
>>>>> Original fix had a nasty flaw.  Fix to fix has been attached.
>>>>>> Under certain circumstances, the ServiceDiscoveryManager internal

>>>>>> LookupCache implementation can incorrectly  process attribute 
>>>>>> change events before the lookup  snapshot is processed.
>>>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>>>>>>                 Key: RIVER-324
>>>>>>                 URL: https://issues.apache.org/jira/browse/RIVER-324
>>>>>>             Project: River
>>>>>>          Issue Type: Bug
>>>>>>          Components: net_jini_lookup
>>>>>>    Affects Versions: AR1
>>>>>>            Reporter: Brian Murphy
>>>>>>            Priority: Minor
>>>>>>             Fix For: AR2
>>>>>>         Attachments: river-324-2.diff, river-324.patch
>>>>>> When an attribute change event is received from the
>>>>>> lookup service between the time the cache registers
>>>>>> the event listener and the initial LookupTask takes
>>>>>> the snapshot of the associated service state, the change event 
>>>>>> can get processed first, which can result in incorrect attribute

>>>>>> state.
>>>>>> This bug has been observed in a currently deployed
>>>>>> system, generally at startup when the services of
>>>>>> the system are changing their attributes from an
>>>>>> initial, 'unknown' state, to a discovered state that is shared 
>>>>>> among those services. What has been
>>>>>> observed is a sequence like the following:
>>>>>> 1. event registration is sent to the lookup service
>>>>>> 2. snapshot is requested (LookupTask is queued)
>>>>>> 3. the lookup service sends back in the requested
>>>>>>    snapshot, the initial state the service registered
>>>>>>    for itself
>>>>>> 4. the service sends an attribute modification    request to the

>>>>>> lookup service, which sends an
>>>>>>    attribute change event to the cache
>>>>>> 5. before the cache's LookupTask processes the    snapshot from 
>>>>>> the lookup service, the event
>>>>>>    arrives and the event processing thread of the
>>>>>>    cache processes the event containing the latest
>>>>>>    state of the service's attributes.
>>>>>> 6. the cache then processes the snapshot, replacing
>>>>>>    the latest, most up-to-date attribute state with
>>>>>>    the original, initial state reflected in the
>>>>>>    snapshot.
>>>>>> 7. the cache now has an incorrect view of the
>>>>>>    service's state.
>>>>>> Bob Scheifler has implemented a simple fix; which
>>>>>> is (quoting Bob), "to have the LookupTask execute the tasks it 
>>>>>> creates directly, rather than queueing
>>>>>> them." That is, force any pending snapshot processing
>>>>>> tasks to be executed before the event processing
>>>>>> tasks.
>>>>>> Note that with the proposed fix, if more than
>>>>>> one lookup service is running, it is possible for
>>>>>> an attribute to "regress" as the lookup services
>>>>>> do not receive a given attribute change at exactly
>>>>>> the same time, but the inconsistency will eventually correct 
>>>>>> itself as the cache receives each attribute
>>>>>> change event, and so should not be a permanent condition.

View raw message