hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6679) RegionServer aborts due to race between compaction and split
Date Wed, 26 Sep 2012 03:38:08 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463494#comment-13463494
] 

Devaraj Das commented on HBASE-6679:
------------------------------------

Okay, did some digging into the logs (that was attached in the jira earlier) and the code.
Doesn't seem like a race between compaction and split (apologies for the confusion I might
have created). The two are sequential (at the end of a compaction, split is requested for).
But I'll note that the split happens in a separate thread.

The problem is that the daughter tries to open a reader to a file that doesn't exist. 
{noformat}
java.io.IOException: Failed ip-10-4-197-133.ec2.internal,60020,1346119706203-daughterOpener=4efb1c92918bbf3c54d0ead3345bb735
	at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:368)
	at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:456)
	at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException: File does not exist: /apps/hbase/data/TestLoadAndVerify_1346120615716/5689a8785bbc9a8aa8e526cd7ef1542a/f1/5a55df83829f401993d95ecf2e539ba1
{noformat}

The method SplitTransaction.createDaughters creates the reference files (via a call to the
method SplitTransaction.splitStoreFiles) that the daughter then tries to open. The list of
files to create references to is the set of entries in the storeFiles field in Store.java
(obtained via the call to this.parent.close in createDaughters). The storeFiles is last updated
(in the thread doing the compaction) in the method Store.completeCompaction.

My suspicion is that the problem is due to the fact that accesses to storeFiles is not synchronized,
and it not volatile either. This leads to inconsistencies in the compaction-thread and split-thread
and the split thread doesn't see the last updated value of the field.

If the above theory is right (and I have this theory only), then the solution could be to
make the storeFiles field volatile.

Thoughts?
                
> RegionServer aborts due to race between compaction and split
> ------------------------------------------------------------
>
>                 Key: HBASE-6679
>                 URL: https://issues.apache.org/jira/browse/HBASE-6679
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.92.3
>
>         Attachments: rs-crash-parallel-compact-split.log
>
>
> In our nightlies, we have seen RS aborts due to compaction and split racing. Original
parent file gets deleted after the compaction, and hence, the daughters don't find the parent
data file. The RS kills itself when this happens. Will attach a snippet of the relevant RS
logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message