hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitri Goldin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-8502) Eternally stuck Region after split
Date Wed, 08 May 2013 15:01:24 GMT

     [ https://issues.apache.org/jira/browse/HBASE-8502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dimitri Goldin updated HBASE-8502:
----------------------------------

    Attachment: hbase_lost_parent.txt
    
> Eternally stuck Region after split
> ----------------------------------
>
>                 Key: HBASE-8502
>                 URL: https://issues.apache.org/jira/browse/HBASE-8502
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>            Reporter: Dimitri Goldin
>            Priority: Critical
>         Attachments: hbase_lost_parent.txt, stuck_region_exception.txt
>
>
> Exact HBase version: 0.92.1-cdh4.1.2
> A couple of days ago I encountered a RIT problem with a single region.
> After an hbck run it started trying to assign a region which has been 
> bouncing between OFFLINE/PENDING_OPEN/OPENING for two days afterwards.
> This was due to a split gone wrong in some way, which led to several 
> reference files being left in the region-directory despite the two relevant HFiles being
copies successfully to the daughter.
> I will try to give as many details as possible, but unfortunately I was
> unable to find any information about the split itself.
> Short thread about this issue on the users-ML: http://mail-archives.apache.org/mod_mbox/hbase-user/201305.mbox/%3C5182758B.1060306@neofonie.de%3E
> ===
> Parent region: 5b9c16898a371de58f31f0bdf86b1f8b
> Daughter region in question: 79c619508659018ff3ef0887611eb8f7
> Rough sequence from the logs seems to be the following:
> ===
> * Received request to open region:
> documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7.
> * Setting up tabledescriptor config now ...
> * Opening of region {NAME =>
> 'documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7.',
>      STARTKEY => '7128586022887322720',
>      ENDKEY => '7130716361635801616',
>      ENCODED => 79c619508659018ff3ef0887611eb8f7,} failed, marking as 
> FAILED_OPEN in ZK
> * File does not exist: 
> /hbase/documents/5b9c16898a371de58f31f0bdf86b1f8b/d/0707b1ec4c6b41cf9174e0d2a1785fe9

> [...]
> ===
> What happened, was that somehow (and that's the question here) the daughters
> region folder contained some left-over reference files were causing the 
> RegionServer to look-up the parent region, which already was deleted.
> original contents of /hbase/documents/79c619508659018ff3ef0887611eb8f7/d:
> ==
> 0707b1ec4c6b41cf9174e0d2a1785fe9.5b9c16898a371de58f31f0bdf86b1f8b
> 47511faae81b4452afd3ca206e28346f.5b9c16898a371de58f31f0bdf86b1f8b
> 4f01ecd052ce464d81e79a62ea227d6b
> 4f01ecd052ce464d81e79a62ea227d6b.5b9c16898a371de58f31f0bdf86b1f8b
> eb7dbb09701d4353be24ca82481c4a7e
> == 
> I attached the full FileNotFound Exception.
> Please let me know if I can provide more information or help otherwise.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message