hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10370) Compaction in out-of-date Store causes region split failed
Date Fri, 17 Jan 2014 19:22:26 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875142#comment-13875142

Hadoop QA commented on HBASE-10370:

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  against trunk revision .
  ATTACHMENT ID: 12623710

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8463//console

This message is automatically generated.

> Compaction in out-of-date Store causes region split failed
> ----------------------------------------------------------
>                 Key: HBASE-10370
>                 URL: https://issues.apache.org/jira/browse/HBASE-10370
>             Project: HBase
>          Issue Type: Bug
>          Components: Compaction
>    Affects Versions: 0.94.3, 0.98.0, 0.99.0
>            Reporter: Liu Shaohui
>            Assignee: Liu Shaohui
>            Priority: Critical
>             Fix For: 0.98.0, 0.96.2, 0.99.0
>         Attachments: 10370-v3.patch, 10370v2.096.txt, HBASE-10370-v1.diff, HBASE-10370-v2.diff
> In out product cluster, we encounter a problem that two daughter regions can not been
opened for FileNotFoundException.
> {quote}
> 2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running
rollback/cleanup of failed split of user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.;
Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
> java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
>         at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375)
>         at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467)
>         at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File
does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf
> ....
> {quote}
> The reason is that a compaction in an out-of-date Store deletes the hfiles, which are
referenced by  the daughter regions after split. This will cause the daughter regions can
not be opened forever. 
> The timeline is that 
> Assumption: there are two hfiles: a, b in Store A in Region R
> t0: A compaction request of Store A(a+b) in Region R is sent.
> t1: First Split for Region R. But this split is timeout and rollbacked. In the rollback,
region reinitializes all store objects , see SplitTransaction #824. Now the store is Region
R is A'(a+b).
> t2: Run the compaction sent in t0 . (hfile: a + b -> c): A(a+b) -> A(c). Hfile
a and b are archived.
> t3: Another Split for Region R. R splits into two region R.0, R.1, which create hfile
references for hfile a, b from Store A'(a + b)
> t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed
for FileNotFoundException.
> I have add a test to identity this problem.
> After search the jira, maybe HBASE-8502 is the same problem. [~goldin]

This message was sent by Atlassian JIRA

View raw message