hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11094) Distributed log replay is incompatible for rolling restarts
Date Fri, 23 May 2014 23:58:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007902#comment-14007902

Jeffrey Zhong commented on HBASE-11094:

Thanks [~enis] for the comments!

the patch attached contains changes to PB generated classes that are not touched by the patch
That's the leftover from others. Even we run protobuf compile against trunk branch without
my changes, the mapreduce changes still shows up.

This should go inside the RegionOpenInfo
Fixed in v2.

Once that is thrown, do we retry on a different server? Do we run out of retries
It will retry though I still created a new exception in v2 so that it's more explicit.

0.98 RS's won't execute the new SLW and RSRpcServices changes in the patch
That's right. The old code doesn't have the change so we cannot do a rolling upgrade(have
to stop & restart everything). There are some options discussed with Enis offline but
not clean and one time effort. It seems we still need to turn it off by default for 1.0 to
support rolling upgrade.  

> Distributed log replay is incompatible for rolling restarts
> -----------------------------------------------------------
>                 Key: HBASE-11094
>                 URL: https://issues.apache.org/jira/browse/HBASE-11094
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Enis Soztutar
>            Assignee: Jeffrey Zhong
>            Priority: Blocker
>             Fix For: 0.99.0
>         Attachments: hbase-11094-v2.patch, hbase-11094.patch
> 0.99.0 comes with dist log replay by default (HBASE-10888). However, reading the code
and discussing this with Jeffrey, we realized that the dist log replay code is not compatible
with rolling upgrades from 0.98.0 and 1.0.0.
> The issue is that, the region server looks at it own configuration to decide whether
the region should be opened in replay mode or not. The open region RPC does not contain that
info. So if dist log replay is enabled on master, the master will assign the region and schedule
replay tasks. If the region is opened in a RS that does not have this conf enabled, then it
will happily open the region in normal mode (not replay mode) causing possible (transient)
data loss. 

This message was sent by Atlassian JIRA

View raw message