hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-2312) Possible data loss when RS goes into GC pause while rolling HLog
Date Fri, 28 Oct 2011 04:36:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138002#comment-13138002

Phabricator commented on HBASE-2312:

stack has requested changes to the revision "HBASE-2312 [jira] Possible data loss when RS
goes into GC pause while rolling HLog".

  Looks good.  A few minor items.

  src/main/java/org/apache/hadoop/hbase/HConstants.java:188 Should this define be up here?
 Why not down in HLog?  (I favor keeping defines in classes they pertain to -- could be ugly
though if we need to go cross packages to get at a define.  Master package would need to reach
down into the wal package?  If so, I suppose, leave it up here in HConstants).
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java:192 The former code bracketed
the return.  Either put the return on same line as if or else add back the brackets I'd say.
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java:223 Why not call server
abort?  Thats the usual idiom.
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java:227 Why this?  And 30
seconds is a long time (Should it be configurable)?
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java:254 I suppose this not
the end of the world.  A regionserver could add a new log I suppose but chances of failed
rename and RS adding new WAL are probably fairly low.
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java:126 Superfluous
logging I'd say.  Especially at info level.  Log when we DON'T have this feature (maybe do
it debug; if every file open has one of these info logs, then its going to generate lots of
queries up in mailing lists, etc.  At DEBUG it won't seem as critical)
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java:831 Nice


> Possible data loss when RS goes into GC pause while rolling HLog
> ----------------------------------------------------------------
>                 Key: HBASE-2312
>                 URL: https://issues.apache.org/jira/browse/HBASE-2312
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.90.0
>            Reporter: Karthik Ranganathan
>            Assignee: Nicolas Spiegelberg
>            Priority: Critical
>             Fix For: 0.92.0
>         Attachments: D99.1.patch
> There is a very corner case when bad things could happen(ie data loss):
> 1)	RS #1 is going to roll its HLog - not yet created the new one, old one will get no
more writes
> 2)	RS #1 enters GC Pause of Death
> 3)	Master lists HLog files of RS#1 that is has to split as RS#1 is dead, starts splitting
> 4)	RS #1 wakes up, created the new HLog (previous one was rolled) and appends an edit
- which is lost
> The following seems like a possible solution:
> 1)	Master detects RS#1 is dead
> 2)	The master renames the /hbase/.logs/<regionserver name>  directory to something
else (say /hbase/.logs/<regionserver name>-dead)
> 3)	Add mkdir support (as opposed to mkdirs) to HDFS - so that a file create fails if
the directory doesn't exist. Dhruba tells me this is very doable.
> 4)	RS#1 comes back up and is not able create the new hlog. It restarts itself.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message