hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chunhui shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10370) Compaction in out-of-date Store causes region split failed
Date Fri, 17 Jan 2014 06:10:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874481#comment-13874481
] 

chunhui shen commented on HBASE-10370:
--------------------------------------

It seems the 'store' object in CompactionRequest is possible to be wrong one. 

Should we cache the family name in CompactionRequest  rather than the store object? 
So we get the store object through HRegion#getStore when it is needed

Nice found!

> Compaction in out-of-date Store causes region split failed
> ----------------------------------------------------------
>
>                 Key: HBASE-10370
>                 URL: https://issues.apache.org/jira/browse/HBASE-10370
>             Project: HBase
>          Issue Type: Bug
>          Components: Compaction
>    Affects Versions: 0.94.3, 0.99.0
>            Reporter: Liu Shaohui
>            Assignee: Liu Shaohui
>            Priority: Critical
>         Attachments: HBASE-10370-v1.diff
>
>
> In out product cluster, we encounter a problem that two daughter regions can not been
opened for FileNotFoundException.
> {quote}
> 2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running
rollback/cleanup of failed split of user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.;
Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
> java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
>         at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375)
>         at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467)
>         at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File
does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf
> ....
> {quote}
> The reason is that a compaction in an out-of-date Store deletes the hfiles, which are
referenced by  the daughter regions after split. This will cause the daughter regions can
not be opened forever. 
> The timeline is that 
> Assumption: there are two hfiles: a, b in Store A in Region R
> t0: A compaction request of Store A(a+b) in Region R is sent.
> t1: First Split for Region R. But this split is timeout and rollbacked. In the rollback,
region reinitializes all store objects , see SplitTransaction #824. Now the store is Region
R is A'(a+b).
> t2: Run the compaction sent in t0 . (hfile: a + b -> c): A(a+b) -> A(c). Hfile
a and b are archived.
> t3: Another Split for Region R. R splits into two region R.0, R.1, which create hfile
references for hfile a, b from Store A'(a + b)
> t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed
for FileNotFoundException.
> I have add a test to identity this problem.
> After search the jira, maybe HBASE-8502 is the same problem. [~goldin]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message