zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Ribeiro (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ZOOKEEPER-2688) rmr leads to "Node does not exist"
Date Thu, 09 Feb 2017 14:03:42 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859552#comment-15859552
] 

Edward Ribeiro edited comment on ZOOKEEPER-2688 at 2/9/17 2:03 PM:
-------------------------------------------------------------------

Hi [~ror6ax], 

The {{rmr}} calls a utility method to recursively delete a root znode and its children. This
method operates in two steps: a) first it traverses the tree from the root path and stores
the absolute paths for root and children in a list. b) then it traverses the list, calling
ZK delete method for each absolute path.

Based on the znode path you posted only, I *guess* that is happening a *race condition* as
below:

0. Suppose that {{/vault}} has a lot of children znodes, many levels deep (this is not really
necessary if the actions interleaving is just precise).

1. Also, *suppose*  {{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}}
is an ephemeral node (it has the word "lock" in its path, so I think it's been used for distributed
locking). The session that owns this ephemeral node has been closed, then znode will be removed;

2. The tree traversal method I cited above gets {{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}}
and adds the path to the list *just before* it has been removed (miliseconds before).

3. ZK deletes {{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}};

4. the list of paths to be delete is traversed (remember: a long list) and *eventually* reaches
{{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}} but when it try
to delete it there is an exception because it has already been deleted by ZK itself;

Does it make sense?

Please, it's just my theory for what is happening.

PS: Even though {{rmr}} is being deprecated in favor of {{deleteAll}}, it also uses the same
utility class so this behavior could potentially happen with with {{deleteAll}}.

*OTOH*, it can indeed be some sort of inconsistency. Did you try to issue a ls or get command
on {{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}} *after the error*
to see if it really exists? Or try to remove it via delete?


was (Author: eribeiro):
Hi [~ror6ax], 

The rmr calls a utility method to recursively delete a root znode and its children. This method
operates in two steps: a) first it traverses the tree from the root path and stores the absolute
paths for root and children in a list. b) it traverses the list and calling ZK delete method
for each path.

Based on the znode path you posted only, I *guess* that is happening a race condition as below:

0. Suppose that {{/vault}} is has a lot of children znodes, many levels deep.

1. Also, *suppose*  {{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}}
is an ephemeral node (it has the word "lock" in its path, so it's been used for distributed
locking). The session that owns this ephemeral node has been closed, then znode will be removed;

2. The tree traversal method I cited above gets {{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}}
and adds the path to the list *just before* it has been removed (miliseconds before).

3. ZK deletes {{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}};

4. the list of paths to be delete is traversed (remember: a long list) and *eventually* reaches
{{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}} but when it try
to delete it there is an exception because it has already been deleted by ZK itself;

Does it make sense?

Please, it's just my theory for what is happening.

PS: Even though {{rmr}} is being deprecated in favor of {{deleteAll}}, it also uses the same
utility class so this behavior could potentially happen with with {{deleteAll}}.

*OTOH*, it can indeed be some sort of inconsistency. Did you try to issue a ls or get command
on {{/vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028}} *after the error*
to see if it really exists? Or try to remove it via delete?

> rmr leads to "Node does not exist"
> ----------------------------------
>
>                 Key: ZOOKEEPER-2688
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2688
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.9
>            Reporter: Gregory Reshetniak
>
> Issuing rmr /vault leads to Node does not exist: /vault/core/_lock/_c_e393e8a4d2c984178373be528a25404a-lock-0000000028
> I know that rmr is getting deprecated in next version, but I think this might be cluster
consistency bug.
> Please advice.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message