hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chunhui shen (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
Date Fri, 25 Nov 2011 05:06:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156991#comment-13156991

chunhui shen commented on HBASE-4862:

@Ted @Todd

I'm sorry my explanation is not clear.
I think I should descibe the detailed case first.

In the whole following process , client's putting data to region C.
1.Sucessfully move region C from server A to server B,
At the moment,there is log entry about region C in both server A's log file and server B's
log file

2.kill server A and server B,

3.restart server B,
Now, mastet start serverShutdownHanlder for server B, and assign the region C to server D

4,Before region C is opend on the server D,restart server A
Now,mastet start serverShutdownHanlder for server A, and split server A's log file.
Because there is log entry about region C in server A's log file (why? see 1), split hlog
thread would create a file F in the region C's recovered.edits directory.

5.In region C opening process, it will execute replayRecoveredEdits(),and then delete file

6.Therefore,in the 4, it throws IO Exception that file F not exists, and cause stopping parse
the current  server A's hlog file, however, other data in this server A's hlog file lossed

The posted region server log is server B's log, and it is doing replayRecoveredEditsIfAny().
Although it prints failed delete of  file recovered.edits/0000000013156791680, but  in fact
this file has been deleted, and master throws file not exist exception :
2011-11-16 11:50:13,037 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1
Got while writing log entry to log org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680
File does not exist.
I'm not sure whether you are clear now, waiting for your question.


> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>         Attachments: 4862.patch
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending
log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny()
,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions
in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok
, and it only prints a error log, continue assigning regions. Therefore, data in other log
files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message