ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ilya Kasnacheev (Jira)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-6965) affinityCall() with key mapping may not be successful with AlwaysFailoverSpi when node left
Date Thu, 14 Nov 2019 09:21:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974081#comment-16974081
] 

Ilya Kasnacheev commented on IGNITE-6965:
-----------------------------------------

I think this is not race condition, since AlwaysFailoverSpi will use GridFailoverContextImpl.topVer
and it will not change to accomodate node loss for an existing call.

> affinityCall() with key mapping may not be successful with AlwaysFailoverSpi when node
left
> -------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-6965
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6965
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache, compute
>    Affects Versions: 2.3
>            Reporter: Alexandr Kuramshin
>            Priority: Major
>         Attachments: IGNITE_6965_affinityCall_with_key_mapping_AlwaysFailoverSpi_node_left.patch
>
>
> When doing {{affinityCall(cacheName, key, callable)}} there is a race between affinity
node left then stopped and {{AlwaysFailoverSpi}} max attempts reached.
> Suppose the following sequence (more probable when {{grid2.order}} >> {{grid1.order}}):
> 1. {{grid1.affinityCall(cacheName, key, callable)}}
> 2. {{grid1}}: {{key}} mapped to the primary partition on {{grid2}}
> 3. {{grid2.stop()}}
> 4. {{grid1}} receives {{NODE_LEFT}} and updates {{discoCache}}
> 5. {{grid1}} execution {{callable}} failed with 'Failed to send job request because remote
node left grid (if fail-over is enabled, will attempt fail-over to another node'
> 6. {{grid1}}: {{AlwaysFailoverSpi}} max attempts reached.
> 7. {{grid1.affinityCall}} failed with 'Job failover failed because number of maximum
failover attempts for affinity call is exceeded'
> 8. {{grid2}} receives verified node left message then stopping.
> The patched {{CacheAffinityCallSelfTest}} reproduces the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message