ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Kovalenko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IGNITE-10799) Optimize affinity initialization/re-calculation
Date Mon, 24 Dec 2018 08:29:00 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pavel Kovalenko updated IGNITE-10799:
-------------------------------------
    Affects Version/s:     (was: 2.1)
                       2.4

> Optimize affinity initialization/re-calculation
> -----------------------------------------------
>
>                 Key: IGNITE-10799
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10799
>             Project: Ignite
>          Issue Type: Improvement
>          Components: cache
>    Affects Versions: 2.4
>            Reporter: Pavel Kovalenko
>            Assignee: Pavel Kovalenko
>            Priority: Major
>             Fix For: 2.8
>
>
> In case of persistence enabled and a baseline is set we have 2 main approaches to recalculate
affinity:
> {noformat}
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerJoinWithExchangeMergeProtocol
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerLeftWithExchangeMergeProtocol
> {noformat}
> Both of them following the same approach of recalculating:
> 1) Take a current baseline (ideal assignment).
> 2) Filter out offline nodes from it.
> 3) Choose new primary nodes if previous went away.
> 4) Place temporal primary nodes to late affinity assignment set.
> Looking at implementation details we may notice that we do a lot of unnecessary online
nodes cache lookups and array list copies. The performance becomes too slow if we do recalculate
affinity for replicated caches (It takes P * N on each node, where P - partitions count, N
- the number of nodes in the cluster). In case of large partitions count or large cluster,
it may take few seconds, which is unacceptable, because this process happens during PME and
freezes ongoing cluster operations.
> We should investigate possible bottlenecks and improve the performance of affinity recalculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message