hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6153) RS aborted due to rename problem (maybe a race)
Date Tue, 05 Jun 2012 17:16:24 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289542#comment-13289542
] 

Devaraj Das commented on HBASE-6153:
------------------------------------

I am mostly sure that because the master couldn't record the split event successfully for
the parent region 106faef9e54a59c1c2b931d1fc36bdbe, it didn't know about the daughter 8974506aa04c5a04e5cc23c11de0039d,
and hence no master action was taken on the daughter when the table was dropped.. 

The fix in HBASE-6070 should address this particular issue of master not noticing split events.

Let me know if the analysis sounds right. We can resolve this issue, if so.
                
> RS aborted due to rename problem (maybe a race)
> -----------------------------------------------
>
>                 Key: HBASE-6153
>                 URL: https://issues.apache.org/jira/browse/HBASE-6153
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>
> I had a RS crash with the following:
> 2012-05-31 18:34:42,534 DEBUG org.apache.hadoop.hbase.regionserver.Store: Renaming flushed
file at hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35
to hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
> 2012-05-31 18:34:42,536 WARN org.apache.hadoop.hbase.regionserver.Store: Unable to rename
hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35
to hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
> 2012-05-31 18:34:42,541 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING
region server ip-10-68-7-146.ec2.internal,60020,1338343120038: Replay of HLog required. Forcing
server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: TestLoadAndVerify_1338488017181,\x15\xD9\x01\x00\x00\x00\x00\x00/000087_0,1338491364569.8974506aa04c5a04e5cc23c11de0039d.
>         at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1288)
>         at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1172)
>         at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1114)
>         at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:400)
>         at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:374)
>         at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.FileNotFoundException: File does not exist: /apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1901)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1892)
>         at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:636)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
>         at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
>         at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:387)
>         at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1008)
>         at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:470)
>         at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548)
>         at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:595)
> On the NameNode logs:
> 2012-05-31 18:34:42,588 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo:
failed to rename /apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35
to /apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
because destination's parent does not exist
> I haven't looked deeply yet but I guess it is a race of some sort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message