hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-6144) Master mistakenly splits live server's HLog file
Date Fri, 01 Jun 2012 04:05:24 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287137#comment-13287137
] 

ramkrishna.s.vasudevan edited comment on HBASE-6144 at 6/1/12 4:05 AM:
-----------------------------------------------------------------------

@Ted
This problem is in which version of HBase? and is it from any of the recent builds that came
in the last 3 or 4 days?
                
      was (Author: ram_krish):
    @Ted
This problem is in which version of HBase? 
                  
> Master mistakenly splits live server's HLog file
> ------------------------------------------------
>
>                 Key: HBASE-6144
>                 URL: https://issues.apache.org/jira/browse/HBASE-6144
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Zhihong Yu
>
> RS abcdn0590 is live, but Master does not have it on its onlineserver list. So, Master
put up the hlog for splitting as shown in the Master log below:
> {code}
> 2012-05-17 21:43:57,692 INFO org.apache.hadoop.hbase.master.SplitLogManager: task /hbase/splitlog/hdfs%3A%2F%2Fnamenode.xyz.com%2Fhbase%2F.logs%2Fabcdn0590.xyz.com%2C60020%2C1337315957185-splitting%2Fabcdn0590.xyz.com%252C60020%252C1337315957185.1337315957711
acquired by abcdn0770.xyz.com,60020,1337315956278. 
> {code}
> After splitting succeeded, Master deleted the file:
> {code}
> 2012-05-17 21:43:58,721 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback:
deleted /hbase/splitlog/hdfs%3A%2F%2Fnamenode.xyz.com%2Fhbase%2F.logs%2Fabcdn0590.xyz.com%2C60020%2C1337315957185-splitting%2Fabcdn0590.xyz.com%252C60020%252C1337315957185.1337315957711
> {code}
> RS abcdn0590 lost the lease to RS abcdn0770, and try to do a Log Roller which closes
the current hlog, and create a new one, as shown in the namenode log:
> {code}
> 2012-05-17 21:43:58,422 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(newblock=blk_2867982016684075739_12741027,
file=/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting/abcdn0590.xyz.com%2C60020%2C1337315957185.1337315957711,
newgenerationstamp=12911920, newlength=134, newtargets=[10.115.13.24:50010, 10.115.15.46:50010,
10.115.15.23:50010]) successful
> 2012-05-17 21:43:59,883 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337316238882.
blk_3811725326431482476_12913541{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.115.13.24:50010|RBW],
ReplicaUnderConstruction[10.115.17.18:50010|RBW], ReplicaUnderConstruction[10.115.17.15:50010|RBW]]}
> {code}
>  
> When RS 0590 try to close the old hlog 1337315957711, it received fatal error below due
to the original hlog is already deleted. The fatal error will cause RS abcdn0590 to shutdown
itself later.
> {code}
> 2012-05-17 21:43:58,889 ERROR org.apache.hadoop.hbase.master.HMaster: Region server ^@^@abcdn0590.xyz.com,60020,1337315957185
reported a fatal error:
> ABORTING region server abcdn0590.xyz.com,60020,1337315957185: IOE in log roller
> Cause:
> java.io.FileNotFoundException: File does not exist: hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337315957711
>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742)
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:583)
>         at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
> {code}
>  
> RS abcdn0590 shutdown at around 21:44. But in the /hbase/.logs dir, it left two sub folder
for the RS abcdn0590 with the same startcode 1337315957185 , they are
> ·         /hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting/
> ·         /hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/
>  
> Later on, at around 21:46:30, Master retry log splitting, this time,  it still consider
RS abcdn0590 as dead RS and try to put up its hlog for others to grab and split. It finds
the folder /hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/, and the first step it does
is to rename it to adding suffix of –splitting.  However, the same folder already exist.
The rename function does not handle the case where the destination folder already exist, instead,
the behavior is putting the src folder under the dst folder, so the path structure looks like
dst/src/file. In our case, It is /hbase/.logs.20120518.1204/abcdn0590.xyz.com,60020,1337315957185-splitting/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337316238882.
>  
> This is from the master log, we can see that two folders for the same RS 0590 at same
startcode exists:
> {code}
> 2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1329941607395-splitting doesn't
belong to a known region server, splitting
> 2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185 doesn't belong to
a known region server, splitting
> 2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting doesn't
belong to a known region server, splitting
>  
> 2012-05-17 21:46:30,962 DEBUG org.apache.hadoop.hbase.master.MasterFileSystem: Renamed
region directory: hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message