ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Ozerov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IGNITE-9828) MVCC: Continuous query failover.
Date Wed, 31 Oct 2018 11:39:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669984#comment-16669984
] 

Vladimir Ozerov edited comment on IGNITE-9828 at 10/31/18 11:38 AM:
--------------------------------------------------------------------

[~rkondakov], [~gvvinblade], my comments:
1) Currently we have five (!!!) nested loops: group -> partitions -> gaps-> caches
-> listeners. Looks like we'd better to assemble gaps, and then apply them to underlying
caches only once. So it would be: groups -> partitions -> caches -> listeners ->
gaps. This way we will minimize garbage and unnecessary method calls.
2) I think we should not use {{CacheContinuousQueryManager#skipUpdateCounter}}. First, it
generates unnecessary garbage ({{CounterSkipContext}}). Second, it has very tricky logic,
and from what I understand only small subset of the code could potentially be invoked during
the specific case we target. Let's create separate method which will do only necessary things
and put clear assertions in code for the sake of safety and efficiency.
3) {{GridDhtPartitionTopologyImpl#finalizeUpdateCounters}] - we always pass {{backup=false}}.
But don't we filter out non-backup partitions in the first place? Is it ok?
4) {{GridDhtPartitionTopologyImpl.finalizeUpdateCounters}} - topology read lock is obtained
in this method. However, several calls deeper we reach {{GridCacheDataStore#finalizeUpdateCounters}}
where store could be potentially initialized under checkpoint read lock. But we have a strict
policy that checkpoint read lock must be acquired before topology read lock. Otherwise deadlocks
are possible. We need to have an answer to the question: "Is it possible that a store (partition)
will be uninitialized in this place?"
5) {{GridDhtPartitionsExchangeFuture#finalizePartitionCounters}} - {{doInParallel}} could
occupy all system threads. This is safe only assuming that no blocking operation ever exist
in system thread. Our experience shows that this is not always the case. To be on the safe
side we should not occupy all system threads. Instead, let's occupy (cnt - 2), this is how
it is done in other places where {{doInParallel}} overload is used. Be careful with situation
when there are <= 2 system threads. 
6) Once p.1, p.2 and p.4 are addressed, we should re-evaluate whether we need to separate
threads or not. Typically we will have not so many gaps. So if we rule out any heavy operations
and user code execution and make more efficient loops, then there is no need to delegate to
separate threads.


was (Author: vozerov):
[~rkondakov], [~gvvinblade], my comments:
1) Currently we have five (!!!) nested loops: group -> partitions -> gaps-> caches
-> listeners. Looks like we'd better to assemble gaps, and then apply them to underlying
caches only once. So it would be: groups -> partitions -> caches -> listeners ->
gaps. This way we will minimize garbage and unnecessary method calls.
2) I think we should not use {{CacheContinuousQueryManager#skipUpdateCounter}}. First, it
generates unnecessary garbage ({{CounterSkipContext}}). Second, it has very tricky logic,
and from what I understand only small subset of the code could potentially be invoked during
the specific case we target. Let's create separate method which will do only necessary things
and put clear assertions in code for the sake of safety and efficiency.
3) {{GridDhtPartitionTopologyImpl#finalizeUpdateCounters}] - we always pass {{backup-false}}.
But don't we filter out non-backup partitions in the first place? Is it ok?
4) {{GridDhtPartitionTopologyImpl.finalizeUpdateCounters}} - topology read lock is obtained
in this method. However, several calls deeper we reach {{GridCacheDataStore#finalizeUpdateCounters}}
where store could be potentially initialized under checkpoint read lock. But we have a strict
policy that checkpoint read lock must be acquired before topology read lock. Otherwise deadlocks
are possible. We need to have an answer to the question: "Is it possible that a store (partition)
will be uninitialized in this place?"
5) {{GridDhtPartitionsExchangeFuture#finalizePartitionCounters}} - {{doInParallel}} could
occupy all system threads. This is safe only assuming that no blocking operation ever exist
in system thread. Our experience shows that this is not always the case. To be on the safe
side we should not occupy all system threads. Instead, let's occupy (cnt - 2), this is how
it is done in other places where {{doInParallel}} overload is used. Be careful with situation
when there are <= 2 system threads. 
6) Once p.1, p.2 and p.4 are addressed, we should re-evaluate whether we need to separate
threads or not. Typically we will have not so many gaps. So if we rule out any heavy operations
and user code execution and make more efficient loops, then there is no need to delegate to
separate threads.

> MVCC: Continuous query failover.
> --------------------------------
>
>                 Key: IGNITE-9828
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9828
>             Project: Ignite
>          Issue Type: Task
>          Components: mvcc
>            Reporter: Roman Kondakov
>            Assignee: Roman Kondakov
>            Priority: Major
>             Fix For: 2.7
>
>
> We need to implement failover procedure for CQ with MVCC caches. Thoughts:
>  * CQ failover tests should be adopted for mvcc scenarios. See {{IgniteCacheQuerySelfTestSuite4}}
>  * We need to ensure the correctness of CQ and partition counters consistency in cases
when some of TX participants are in prepared state and others are marked as rollback only.
It looks like it doesn't work properly now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message