zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aishwarya Ganesan <ash8as...@gmail.com>
Subject Crash on detecting a corruption
Date Fri, 06 Jan 2017 20:38:57 GMT

We are looking at how ZooKeeper handles silent data corruptions resulting
from underlying problems in disks and file systems atop them [1,2].

We set up a 3-node ZooKeeper cluster and introduce silent data corruptions
to different blocks in the on-disk files. In all the cases, ZooKeeper is
able to detect corruptions in the log file using checksums.

However, on detecting a corruption, the ZooKeeper node in which corruption
occurred crashes instead of trying to fix the corrupted data automatically
using the replicas. Why does ZooKeeper not fix the corrupted entry
automatically using replicas? What is the reason for this design decision?
It would be helpful if anyone could give some insights on this.

[1] https://research.cs.wisc.edu/wind/Publications/zfs-corruption-fast10.pdf
[2] http://www.cs.toronto.edu/~bianca/papers/fast08.pdf


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message