ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Yancey <yan...@uber.com>
Subject Reloading from Persistent Store after Losing a Node
Date Thu, 26 Jan 2017 16:14:16 GMT
Hello Ignite users!

I have a use case where I am doing SQL queries on a sharded cache, and I
need to ensure that SQL queries always return The Right Answer even if some
nodes in the ring are lost. As I have rigorously confirmed, SQL queries
only apply to data in the cache (as opposed to in the write-through
persistent store but lost from the cache). Also, when you lose a node, you
don't lose persisted data, but data IS now gone from the cache (unless
there is an in-cache backup of the relevant cache partitions).

Now, I *could* do this by just increasing the backup factor for the cache
equal to the number of nodes I can stand to lose, and then setting a
TopologyValidator on the cache to ensure I always have more nodes in the
ring than that number. If the TopologyValidator ever returns a number of
nodes below this survivability threshold, I crash the app and let
everything get reloaded from the persistent store when the nodes
automatically start back up.

This technique has a lot of false positives, where we lose too many nodes,
but slowly enough that Ignite is well-able to shift the data around to
avoid data loss and so we shouldn't have had to crash the app.

Therefore, I would rather be a little smarter about this for the sake of

Ideally, in the TopologyValidator logic, while reads and writes to the
cache are blocked, I would be able to:

1.) Detect when a lost partition has no viable backup,
2.) Reload from the persistent store.

The problem I am facing is, I can't find a clean and efficient way of
figuring out #1 from the information the ToplogyValidator gives you.

And even if I could, #2 hangs forever, which makes sense because the cache
isn't readable or writeable until AFTER the topology has been validated.

Has anyone faced a similar challenge and has some wisdom to share? Am I
making this way more complicated than it needs to be?

Thanks in advance,

View raw message