ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Kovalenko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IGNITE-7467) Verify partition update counters and sizes on partition map exchange
Date Thu, 18 Jan 2018 13:11:00 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pavel Kovalenko updated IGNITE-7467:
------------------------------------
    Description: 
In Ignite we heavily rely on an invariant that under no load owning partitions will have equal
sizes and, more importantly, equal partition counters. This invariant becomes even more important
when persistence is enabled.

However, due to a possible bug in the code, this invariant can be violated which in a long
run may lead to an undetected data loss. We need to take best effort to detect such a situation
as soon as possible.

Currently, we already send partition update counters during partition map exchange. We can
also send partition sizes and verify that corresponding partitions in OWNING state have equal
partition update counters and sizes.

If a divergence detected, we can:
1) Always print out an error message to the log
2) Move the corresponding caches to the read-only state to prevent further corruption or operating
on invalid data

Also, we can introduce a ./control.sh command which will trigger an empty exchange to verify
the partition states.

> Verify partition update counters and sizes on partition map exchange
> --------------------------------------------------------------------
>
>                 Key: IGNITE-7467
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7467
>             Project: Ignite
>          Issue Type: Improvement
>          Components: persistence
>    Affects Versions: 2.1
>         Environment: In Ignite we heavily rely on an invariant that under no load owning
partitions will have equal sizes and, more importantly, equal partition counters. This invariant
becomes even more important when persistence is enabled.
> However, due to a possible bug in the code, this invariant can be violated which in a
long run may lead to an undetected data loss. We need to take best effort to detect such a
situation as soon as possible.
> Currently, we already send partition update counters during partition map exchange. We
can also send partition sizes and verify that corresponding partitions in OWNING state have
equal partition update counters and sizes.
> If a divergence detected, we can:
> 1) Always print out an error message to the log
> 2) Move the corresponding caches to the read-only state to prevent further corruption
or operating on invalid data
> Also, we can introduce a ./control.sh command which will trigger an empty exchange to
verify the partition states.
>            Reporter: Alexey Goncharuk
>            Assignee: Pavel Kovalenko
>            Priority: Major
>
> In Ignite we heavily rely on an invariant that under no load owning partitions will have
equal sizes and, more importantly, equal partition counters. This invariant becomes even more
important when persistence is enabled.
> However, due to a possible bug in the code, this invariant can be violated which in a
long run may lead to an undetected data loss. We need to take best effort to detect such a
situation as soon as possible.
> Currently, we already send partition update counters during partition map exchange. We
can also send partition sizes and verify that corresponding partitions in OWNING state have
equal partition update counters and sizes.
> If a divergence detected, we can:
> 1) Always print out an error message to the log
> 2) Move the corresponding caches to the read-only state to prevent further corruption
or operating on invalid data
> Also, we can introduce a ./control.sh command which will trigger an empty exchange to
verify the partition states.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message