ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexei Scherbakov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-10078) Node failure during concurrent partition updates may cause partition desync between primary and backup.
Date Mon, 06 May 2019 16:25:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833971#comment-16833971

Alexei Scherbakov commented on IGNITE-10078:

[~agoncharuk] [~Pavlukhin]

Finally I have fixed all remaining issues with failing tests. Remaining blocker is not related
to my change (permanently broken in master).

All comments above are addressed.

[~Pavlukhin] I'm not totally agree with _pending update_ terminology because pending means
not happening yet. Actually an update was happened and was out of order, so I stick with such
naming. _Gaps_ are good.

Just to clarify, this is an attempt to fix most dare issues with full and historical rebalancing
leading to partition desync. A bunch of other related tickets were created and need to be
addressed as soon as the contribution will be accepted.

Below brief description of major changes introduced by this contribution:
 # In addition to partition update counter a _reservation counter_ was introduced. Used on
primary node to address a scenario then commit happens first on backup and second on primary
node, on example one-phase commit. In such case because of requirement to increment counter
only then update is written to WAL we need some kind of _high watermark._ HWM used for tracking
pending (not yet applied updates) or in case of primary node failure we might have wrong counter.
Reservation counter is only incremented on primary node and is synchronized between partition
owners on PME. Old update counter is serving as _low watermark_, pointing to the upper bound
of sequential updates.
 # {{WALHistoricalIterator}} is fixed.
 # Introduced {{RollbackRecord}} to correctly track state of partition updates - missed updates
do not have corresponding RollbackRecord, but rolled back transactions (by hand or tx recovery)
will produce proper RollbackRecord. It's used then by historical iterator.
 # Gaps in update sequence are persisted between checkpoints. Necessary to understand correct
update counter(LWM) for rebalancing.
 # Implemented a way to store any metadata to partition file. New freelist is introduced: {{PartitionMetaStorageImpl}}.
Used together with {{CacheFreeListImpl}}
 # Fixed several issues leading to partition desync during rebalancing, most notably {{GridDhtLocalPartition.rmvQueue}} overflow.


> Node failure during concurrent partition updates may cause partition desync between primary
and backup.
> -------------------------------------------------------------------------------------------------------
>                 Key: IGNITE-10078
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10078
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexei Scherbakov
>            Assignee: Alexei Scherbakov
>            Priority: Major
>             Fix For: 2.8
> This is possible if some updates are not written to WAL before node failure. They will
be not applied by rebalancing due to same partition counters in certain scenario:
> 1. Start grid with 3 nodes, 2 backups.
> 2. Preload some data to partition P.
> 3. Start two concurrent transactions writing single key to the same partition P, keys
are different
> {noformat}
> try(Transaction tx = client.transactions().txStart(PESSIMISTIC, REPEATABLE_READ, 0, 1))
>       client.cache(DEFAULT_CACHE_NAME).put(k, v);
>       tx.commit();
> }
> {noformat}
> 4. Order updates on backup in the way such update with max partition counter is written
to WAL and update with lesser partition counter failed due to triggering of FH before it's
added to WAL
> 5. Return failed node to grid, observe no rebalancing due to same partition counters.
> Possible solution: detect gaps in update counters on recovery and force rebalance from
a node without gaps if detected.

This message was sent by Atlassian JIRA

View raw message