hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8502) Eternally stuck Region after split
Date Fri, 10 May 2013 18:37:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654703#comment-13654703

Jean-Daniel Cryans commented on HBASE-8502:

This sounds like something someone else encountered that I helped fixing. In their case, they
ran HBCK after a split happened (not sure why they did) and it merged the parent and daughters
into a new region. The problem is that the reference files were still there and they were
put alongside the files they reference to. What happens here is that since the parent also
got moved, the referenced files moved too and are not there anymore. This is why in my case
the region wasn't able to open getting a FNFE, and it kept getting reassigned by the master.

The fix is to delete the reference files, or at least move them away, since the original file
is right there.

[~goldin] can you verify that 79c619508659018ff3ef0887611eb8f7 is really a daughter of 5b9c16898a371de58f31f0bdf86b1f8b?
It should tell in the RS log when it splits, or in the master log when the split is reported.
If it's not a daughter, then this could definitely be the same issue, as 79c619508659018ff3ef0887611eb8f7
would be the region created by HBCK.
> Eternally stuck Region after split
> ----------------------------------
>                 Key: HBASE-8502
>                 URL: https://issues.apache.org/jira/browse/HBASE-8502
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>            Reporter: Dimitri Goldin
>            Priority: Critical
>         Attachments: hbase_lost_parent.txt, stuck_region_exception.txt
> Exact HBase version: 0.92.1-cdh4.1.2
> A couple of days ago I encountered a RIT problem with a single region.
> After an hbck run it started trying to assign a region which has been 
> bouncing between OFFLINE/PENDING_OPEN/OPENING for two days afterwards.
> This was due to a split gone wrong in some way, which led to several 
> reference files being left in the region-directory despite the two relevant HFiles being
copies successfully to the daughter.
> I will try to give as many details as possible, but unfortunately I was
> unable to find any information about the split itself.
> Short thread about this issue on the users-ML: http://mail-archives.apache.org/mod_mbox/hbase-user/201305.mbox/%3C5182758B.1060306@neofonie.de%3E
> ===
> Parent region: 5b9c16898a371de58f31f0bdf86b1f8b
> Daughter region in question: 79c619508659018ff3ef0887611eb8f7
> Rough sequence from the logs seems to be the following:
> ===
> * Received request to open region:
> documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7.
> * Setting up tabledescriptor config now ...
> * Opening of region {NAME =>
> 'documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7.',
>      STARTKEY => '7128586022887322720',
>      ENDKEY => '7130716361635801616',
>      ENCODED => 79c619508659018ff3ef0887611eb8f7,} failed, marking as 
> * File does not exist: 
> /hbase/documents/5b9c16898a371de58f31f0bdf86b1f8b/d/0707b1ec4c6b41cf9174e0d2a1785fe9

> [...]
> ===
> What happened, was that somehow (and that's the question here) the daughters
> region folder contained some left-over reference files were causing the 
> RegionServer to look-up the parent region, which already was deleted.
> original contents of /hbase/documents/79c619508659018ff3ef0887611eb8f7/d:
> ==
> 0707b1ec4c6b41cf9174e0d2a1785fe9.5b9c16898a371de58f31f0bdf86b1f8b
> 47511faae81b4452afd3ca206e28346f.5b9c16898a371de58f31f0bdf86b1f8b
> 4f01ecd052ce464d81e79a62ea227d6b
> 4f01ecd052ce464d81e79a62ea227d6b.5b9c16898a371de58f31f0bdf86b1f8b
> eb7dbb09701d4353be24ca82481c4a7e
> == 
> I attached the full FileNotFound Exception.
> Please let me know if I can provide more information or help otherwise.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message