curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Bae (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CURATOR-76) Adding leader selection ChildReaper recipe
Date Wed, 27 Nov 2013 23:28:35 GMT

     [ https://issues.apache.org/jira/browse/CURATOR-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Bae updated CURATOR-76:
---------------------------

    Description: We are having serious data corruption issue when we are rolling restart of
zookeeper servers due to one application which is using ChildReaper recipe. I am not sure
its root cause but my theory is, when the multiple instances are running ChildReaper recipe,
they would conflict each other among checking exist and deleting paths. This conflict can
cause data corruption. We observed all servers died due to corrupted data and we had to manually
copy log/snapshot data and restart them.  (was: We are having serious data corruption issue
when we are rolling restart of zookeeper servers due to one application which is using ChildReaper
recipe. I am not sure its root cause but my theory is, when the multiple instances are running
ChildReaper recipe, they would conflict each other among checking exist and deleting paths.
This conflict can cause data corruption. We observed all servers died due to corrupted data
and we had to manually copy log/snapshot data and restart them.

Also, it wouldn't be enough checking simply whether the zknode is empty. It would be better
if ChildReaper is checking the node is empty and it's not modified for the amount of time.)
     Issue Type: Improvement  (was: Bug)
        Summary: Adding leader selection ChildReaper recipe  (was: Adding leader selection
and TTL feature in ChildReaper recipe)

> Adding leader selection ChildReaper recipe
> ------------------------------------------
>
>                 Key: CURATOR-76
>                 URL: https://issues.apache.org/jira/browse/CURATOR-76
>             Project: Apache Curator
>          Issue Type: Improvement
>          Components: Recipes
>            Reporter: Jay Bae
>
> We are having serious data corruption issue when we are rolling restart of zookeeper
servers due to one application which is using ChildReaper recipe. I am not sure its root cause
but my theory is, when the multiple instances are running ChildReaper recipe, they would conflict
each other among checking exist and deleting paths. This conflict can cause data corruption.
We observed all servers died due to corrupted data and we had to manually copy log/snapshot
data and restart them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message