hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Ranganathan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2312) Possible data loss when RS goes into GC pause while rolling HLog
Date Fri, 12 Mar 2010 01:59:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844334#action_12844334

Karthik Ranganathan commented on HBASE-2312:

The .exclusive_lock will not work as the following two operations are not atomic:
-  Check if I own the lock
-  Create the file
The GC pause could hit between the two.

The other scheme - log.i - would work, but is not very clean. Right now the log names are
of the format log.<timestamp>. In this scheme the master will have to list all the files,
parse their names to find the current max i and then create the new file log.(i+1). Also,
in your example, it is possible for the master to fail creating log.4 if the RS comes out
of the GC pause - I guess it could create the next file, but doesn't feel like a clean solution.

Something else we noticed when looking at the code is that HDFS would overwrite the existing
file if another is created with the same name? If so, then the master can clobber log.4 created
by the RS - not sure what would happen in this case.
Say in SequenceFile.java:838
fs.create(name, true, ...); // true is the overwrite flag

> Possible data loss when RS goes into GC pause while rolling HLog
> ----------------------------------------------------------------
>                 Key: HBASE-2312
>                 URL: https://issues.apache.org/jira/browse/HBASE-2312
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.20.3
>            Reporter: Karthik Ranganathan
> There is a very corner case when bad things could happen(ie data loss):
> 1)	RS #1 is going to roll its HLog - not yet created the new one, old one will get no
more writes
> 2)	RS #1 enters GC Pause of Death
> 3)	Master lists HLog files of RS#1 that is has to split as RS#1 is dead, starts splitting
> 4)	RS #1 wakes up, created the new HLog (previous one was rolled) and appends an edit
- which is lost
> The following seems like a possible solution:
> 1)	Master detects RS#1 is dead
> 2)	The master renames the /hbase/.logs/<regionserver name>  directory to something
else (say /hbase/.logs/<regionserver name>-dead)
> 3)	Add mkdir support (as opposed to mkdirs) to HDFS - so that a file create fails if
the directory doesn't exist. Dhruba tells me this is very doable.
> 4)	RS#1 comes back up and is not able create the new hlog. It restarts itself.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message