hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-2312) Possible data loss when RS goes into GC pause while rolling HLog
Date Fri, 28 Oct 2011 22:28:32 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138865#comment-13138865
] 

Phabricator commented on HBASE-2312:
------------------------------------

nspiegelberg has commented on the revision "HBASE-2312 [jira] Possible data loss when RS goes
into GC pause while rolling HLog".

  sending a new diff.  note that Phabricator has a nifty diff-compare tool for you to make
it easy to see my code changes in response to your commentary  :)

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java:223 Prakash did the Exception
handling part of this code, so I'll let him follow up if you need more info.  Basically, we
are still in the initialization phase (we are in the server abort thread right now), where
server abort should be used during normal operation.

REVISION DETAIL
  https://reviews.facebook.net/D99

                
> Possible data loss when RS goes into GC pause while rolling HLog
> ----------------------------------------------------------------
>
>                 Key: HBASE-2312
>                 URL: https://issues.apache.org/jira/browse/HBASE-2312
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.90.0
>            Reporter: Karthik Ranganathan
>            Assignee: Nicolas Spiegelberg
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: D99.1.patch
>
>
> There is a very corner case when bad things could happen(ie data loss):
> 1)	RS #1 is going to roll its HLog - not yet created the new one, old one will get no
more writes
> 2)	RS #1 enters GC Pause of Death
> 3)	Master lists HLog files of RS#1 that is has to split as RS#1 is dead, starts splitting
> 4)	RS #1 wakes up, created the new HLog (previous one was rolled) and appends an edit
- which is lost
> The following seems like a possible solution:
> 1)	Master detects RS#1 is dead
> 2)	The master renames the /hbase/.logs/<regionserver name>  directory to something
else (say /hbase/.logs/<regionserver name>-dead)
> 3)	Add mkdir support (as opposed to mkdirs) to HDFS - so that a file create fails if
the directory doesn't exist. Dhruba tells me this is very doable.
> 4)	RS#1 comes back up and is not able create the new hlog. It restarts itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message