hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11094) Distributed log replay is incompatible for rolling restarts
Date Tue, 27 May 2014 04:16:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009261#comment-14009261

Jeffrey Zhong commented on HBASE-11094:

Thanks for the comments!

How does operator know when this has been done?
HBASE-10544 will definitely help. In the meantime, an administrator needs to wait for all
split tasks under znode splitLogZNode clear and then restart master & then all region

Im talking about location for RegionServerConfigMismatchException
Any suggestion where I should put it?

Suggest rename as openForReplay. 
Ok. I'll change the name in the next patch.

Or, if a crash after the M and the RS have been rolling restarted.  Only one RS will be able
to open regions.  It could take a while  for the M to figure this out going by the below
Yes for regions in recovery while for a normal(without any recovery work) region move/open
will not be affected. Also rolling restart of RSs shouldn't take long time.

What happens on non-upgraded servers when we pass the code path that this is inserted into?
That's the reason that blocks rolling upgrade. If both old & upgraded code are aware of
different recovery mode(including the JIRA patch), we're fine.

 What would happen in the above scenarios?
The above code make sure SplitLogWorker only grab split log task intended with the same recovery

> Distributed log replay is incompatible for rolling restarts
> -----------------------------------------------------------
>                 Key: HBASE-11094
>                 URL: https://issues.apache.org/jira/browse/HBASE-11094
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Enis Soztutar
>            Assignee: Jeffrey Zhong
>            Priority: Blocker
>             Fix For: 0.99.0
>         Attachments: hbase-11094-v2.patch, hbase-11094.patch
> 0.99.0 comes with dist log replay by default (HBASE-10888). However, reading the code
and discussing this with Jeffrey, we realized that the dist log replay code is not compatible
with rolling upgrades from 0.98.0 and 1.0.0.
> The issue is that, the region server looks at it own configuration to decide whether
the region should be opened in replay mode or not. The open region RPC does not contain that
info. So if dist log replay is enabled on master, the master will assign the region and schedule
replay tasks. If the region is opened in a RS that does not have this conf enabled, then it
will happily open the region in normal mode (not replay mode) causing possible (transient)
data loss. 

This message was sent by Atlassian JIRA

View raw message