ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Muzafarov <maxmu...@gmail.com>
Subject Exchange stucks while node restoring state from WAL
Date Fri, 03 Aug 2018 06:44:04 GMT
Hi Igniters,

I'm working on bug [1] and have some questions about the final
implementation. Probably, I've already found answers on some of
them but I want to be sure. Please, help me to clarify details.

The key problem here is that we are reading WAL and restoring
memory state of new joined node inside PME. Reading WAL can
consume huge amount of time, so the whole cluster stucks and
waits for the single node.

1) Is it correct that readMetastore() happens after node starts
but before including node into the ring?

2) Is after onDone() method called for LocalJoinFuture on local
node happend we can proceed with initiating PME on local node?

3) After reading checkpoint and restore memory for new joined
node how and when we are updating obsolete partitions update
counter? At historical rebalance, right?

4) Should we restoreMemory for new joined node before PME
initiates on the other nodes in cluster?

5) Does in our final solution for new joined node readMetastore
and restoreMemory should be performed in one step?

[1] https://issues.apache.org/jira/browse/IGNITE-7196
Maxim Muzafarov

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message