hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2312) Possible data loss when RS goes into GC pause while rolling HLog
Date Mon, 15 Mar 2010 20:11:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845494#action_12845494
] 

Todd Lipcon commented on HBASE-2312:
------------------------------------

bq. you are assuming that the HMaster opens the last log file first

yea, and I think the HMaster needs to "chase" the regionserver - after it opens the last one,
it looks for an "intent to roll" at the end, and if it finds it, opens the next one. It has
to do this until it finds a log that doesn't end in "intent to roll" - otherwise we're susceptible
to a really unlikely double-roll condition.

> Possible data loss when RS goes into GC pause while rolling HLog
> ----------------------------------------------------------------
>
>                 Key: HBASE-2312
>                 URL: https://issues.apache.org/jira/browse/HBASE-2312
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.20.3
>            Reporter: Karthik Ranganathan
>
> There is a very corner case when bad things could happen(ie data loss):
> 1)	RS #1 is going to roll its HLog - not yet created the new one, old one will get no
more writes
> 2)	RS #1 enters GC Pause of Death
> 3)	Master lists HLog files of RS#1 that is has to split as RS#1 is dead, starts splitting
> 4)	RS #1 wakes up, created the new HLog (previous one was rolled) and appends an edit
- which is lost
> The following seems like a possible solution:
> 1)	Master detects RS#1 is dead
> 2)	The master renames the /hbase/.logs/<regionserver name>  directory to something
else (say /hbase/.logs/<regionserver name>-dead)
> 3)	Add mkdir support (as opposed to mkdirs) to HDFS - so that a file create fails if
the directory doesn't exist. Dhruba tells me this is very doable.
> 4)	RS#1 comes back up and is not able create the new hlog. It restarts itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message