hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-12865) WALs may be deleted before they are replicated to peers
Date Sun, 18 Jan 2015 07:25:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281686#comment-14281686
] 

Lars Hofhansl edited comment on HBASE-12865 at 1/18/15 7:24 AM:
----------------------------------------------------------------

Hmm... Couple of thoughts:
# Can we do simple optimistic concurrency control here? In the beginning of the method we
check the parent node's cversion (that is the number of changes to the children of this znode),
in the end we check it again. If it changed we start over inside the method, or simply say
that no files can be deleted and try again during the next call.
# Maybe when an RS takes over a queue, should it touch all involved logs first, so they all
get a new timestamps? In that case they would not be eligible for deletion until they expire
again. That would need to be done *before* the queues are moved in ZK.
# Or, since this is only an issue when the *same* region server enumerates the queues and
adds a queue from another RS we only need coordination between the threads doing this. That
is: Block the NodeFailoverWorker from claiming any new queues while there's a cleanup or check
in process.
# There might also be a more complex problem. Queues could be moved *after* we checked, but
before we get to the delete code. So we would need to make sure queues are not moved until
after we finished a delete cycle.

#1 seems simple enough. #2 should work, but there's no guarantee and it means NN actions.
#3 should also work fine.

#4 is a concern with all these approaches; we need to avoid changes to the queue from the
point we start checking to the queues to the point where we finish the current delete cycle.
And so that cannot be handled 100% in a LogCleaner alone (we might need to add begin() and
end() hooks to the cleaners... Ugh.)


was (Author: lhofhansl):
Hmm... Couple of thoughts:
# Can we do simple optimistic concurrency control here? In the beginning of the method we
check the parent node's cversion (that is the number of changes to the children of this znode),
in the end we check it again. If it changed we start over inside the method, or simply say
that no files can be deleted and try again during the next call.
# Maybe when an RS takes over a queue, should it touch all involved logs first, so they all
get a new timestamps? In that case they would not be eligible for deletion until they expire
again. That would need to be done *before* the queues are moved in ZK.
# Or, since this is only an issue when the *same* region server enumerates the queues and
adds a queue from another RS we only need coordination between the threads doing this. That
is: Block the NodeFailoverWorker from claiming any new queues while there's a cleanup or check
in process.
# There might also be a more complex problem. Queues could be moved *after* we checked, but
before we get to the delete code. So we would need to make sure queues are not moved until
after we finished a delete cycle.

#1 seems simple enough. #2 should work, but there's no guarantee and it means NN actions.
#3 should also work fine.

#4 is a concern with all these approaches; we need to avoid there no changes in queue from
the point we start checking to the queues to the point where we finish the current delete
cycle. And so that cannot be handled with 100% in a LogCleaner alone (we might need to add
begin() and end() hooks to the cleaners... Ugh.)

> WALs may be deleted before they are replicated to peers
> -------------------------------------------------------
>
>                 Key: HBASE-12865
>                 URL: https://issues.apache.org/jira/browse/HBASE-12865
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Liu Shaohui
>
> By design, ReplicationLogCleaner guarantee that the WALs  being in replication queue
can't been deleted by the HMaster. The ReplicationLogCleaner gets the WAL set from zookeeper
by scanning the replication zk node. But it may get uncompleted WAL set during replication
failover for the scan operation is not atomic.
> For example: There are three region servers: rs1, rs2, rs3, and peer id 10.  The layout
of replication zookeeper nodes is:
> {code}
> /hbase/replication/rs/rs1/10/wals
>                      /rs2/10/wals
>                      /rs3/10/wals
> {code}
> - t1: the ReplicationLogCleaner finished scanning the replication queue of rs1, and start
to scan the queue of rs2.
> - t2: region server rs3 is down, and rs1 take over rs3's replication queue. The new layout
is
> {code}
> /hbase/replication/rs/rs1/10/wals
>                      /rs1/10-rs3/wals
>                      /rs2/10/wals
>                      /rs3
> {code}
> - t3, the ReplicationLogCleaner finished scanning the queue of rs2, and start to scan
the node of rs3. But the the queue has been moved to  "replication/rs1/10-rs3/WALS"
> So the  ReplicationLogCleaner will miss the WALs of rs3 in peer 10 and the hmaster may
delete these WALs before they are replicated to peer clusters.
> We encountered this problem in our cluster and I think it's a serious bug for replication.
> Suggestions are welcomed to fix this bug. thx~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message