geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GEODE-542) Race in FunctionService.onMembers can result in hang during member startup
Date Wed, 11 Nov 2015 20:27:11 GMT

    [ https://issues.apache.org/jira/browse/GEODE-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001027#comment-15001027
] 

ASF subversion and git services commented on GEODE-542:
-------------------------------------------------------

Commit a25a662b6e0117b79c0f1987ecf34fd94e73dda1 in incubator-geode's branch refs/heads/feature/GEODE-542
from [~upthewaterspout]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=a25a662 ]

GEODE-542: Send a function response after a CancelException

There was a catch clause of a CancelException that was causing us not to
reply to a function call if a CacheClosedException was thrown from the
function. That caused as hang waiting for replies.


> Race in FunctionService.onMembers can result in hang during member startup
> --------------------------------------------------------------------------
>
>                 Key: GEODE-542
>                 URL: https://issues.apache.org/jira/browse/GEODE-542
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Dan Smith
>            Assignee: Dan Smith
>
> I hit this while doing some internal tests of FunctionService. I have a function that
calls CacheFactory.getAnyInstance(). I was seeing that occasionally, my function would never
see a reply while a member was starting up.
> Turning on debug logging, I found this is the logs
> {noformat}
> [fine 2015/10/28 17:15:41.903 PDT clientgemfire2_gluon_2055 <Function Execution Processor2>
tid=0x37] shutdown caught, abandoning message: A cache has not yet been created.
> com.gemstone.gemfire.cache.CacheClosedException: A cache has not yet been created.
> 	at com.gemstone.gemfire.cache.CacheFactory.getAnyInstance(CacheFactory.java:292)
> 	at com.gemstone.gemfire.internal.cache.execute.util.RollbackFunction.execute(RollbackFunction.java:82)
> 	at com.gemstone.gemfire.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:194)
> 	at com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:380)
> 	at com.gemstone.gemfire.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:451)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at com.gemstone.gemfire.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:701)
> 	at com.gemstone.gemfire.distributed.internal.DistributionManager$9$1.run(DistributionManager.java:1158)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This seems wrong, because by not replying to the function the caller then can hang. I
think this code was intended for use during shutdown, but it also gets hit during startup
because members are available to process functions before the cache is created. That in itself
is perhaps problematic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message