hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Ranganathan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2312) Possible data loss when RS goes into GC pause while rolling HLog
Date Mon, 15 Mar 2010 21:07:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845526#action_12845526
] 

Karthik Ranganathan commented on HBASE-2312:
--------------------------------------------

A little confused about your comment. We have the follwing sequence of actions:

1) Write "intend to roll HLog to new file hlog.N+1" to hlog.N
2) Open hlog.N+1 for append
3) Write "finished rolling" to hlog.N
4) continue writing to hlog.N+1

If the GC pause hits before 2, no new log file is created. Master will take the append lease
on log.N and step 3 will fail later. No edits could have gone into the new log.
If the GC pause hits after 3, the new log file is the one in effect, so no issues there.
If the GC pause hits after 2 but before 3, the master will always see the last log file (log.N+1)
right? So master will try to take the append lease on log.N+1.
  - Master gets the append lease on log.N+1 in which case at the most RS does step 3 and fails
on 4
  - Master does not get the lease on log.N+1, its still waiting for it, in which case the
RS logs the edits to log.N+1 and then quits. Master does not lose the edits.

What is the scenario when the master chases the RS? The only thing I can think of is that
step 2 takes a long time - but presumable the detection of the RS being dead takes longer?

> Possible data loss when RS goes into GC pause while rolling HLog
> ----------------------------------------------------------------
>
>                 Key: HBASE-2312
>                 URL: https://issues.apache.org/jira/browse/HBASE-2312
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.20.3
>            Reporter: Karthik Ranganathan
>
> There is a very corner case when bad things could happen(ie data loss):
> 1)	RS #1 is going to roll its HLog - not yet created the new one, old one will get no
more writes
> 2)	RS #1 enters GC Pause of Death
> 3)	Master lists HLog files of RS#1 that is has to split as RS#1 is dead, starts splitting
> 4)	RS #1 wakes up, created the new HLog (previous one was rolled) and appends an edit
- which is lost
> The following seems like a possible solution:
> 1)	Master detects RS#1 is dead
> 2)	The master renames the /hbase/.logs/<regionserver name>  directory to something
else (say /hbase/.logs/<regionserver name>-dead)
> 3)	Add mkdir support (as opposed to mkdirs) to HDFS - so that a file create fails if
the directory doesn't exist. Dhruba tells me this is very doable.
> 4)	RS#1 comes back up and is not able create the new hlog. It restarts itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message