hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy Xiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8502) Eternally stuck Region after split
Date Wed, 08 May 2013 17:47:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652115#comment-13652115
] 

Jimmy Xiang commented on HBASE-8502:
------------------------------------

[~goldin], thanks a lot for the log. We can tell the region was split twice.  The first time
was around 2013-03-18 18:53:10,164, but it failed and rolled back at 2013-03-18 18:53:59,108.
The second time was at 2013-03-18 19:44:59,963, which failed since a hfile was missing.  The
same hfile was the reason that the region stuck in transition.

The first split took quite some time.  My guess is that it's because it had some problem to
access this hfile in question. Can you check your HDFS NN and DN log about this file?

File does not exist: /hbase/documents/5b9c16898a371de58f31f0bdf86b1f8b/d/0707b1ec4c6b41cf9174e0d2a1785fe9

Do we know what happened to it?
                
> Eternally stuck Region after split
> ----------------------------------
>
>                 Key: HBASE-8502
>                 URL: https://issues.apache.org/jira/browse/HBASE-8502
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>            Reporter: Dimitri Goldin
>            Priority: Critical
>         Attachments: hbase_lost_parent.txt, stuck_region_exception.txt
>
>
> Exact HBase version: 0.92.1-cdh4.1.2
> A couple of days ago I encountered a RIT problem with a single region.
> After an hbck run it started trying to assign a region which has been 
> bouncing between OFFLINE/PENDING_OPEN/OPENING for two days afterwards.
> This was due to a split gone wrong in some way, which led to several 
> reference files being left in the region-directory despite the two relevant HFiles being
copies successfully to the daughter.
> I will try to give as many details as possible, but unfortunately I was
> unable to find any information about the split itself.
> Short thread about this issue on the users-ML: http://mail-archives.apache.org/mod_mbox/hbase-user/201305.mbox/%3C5182758B.1060306@neofonie.de%3E
> ===
> Parent region: 5b9c16898a371de58f31f0bdf86b1f8b
> Daughter region in question: 79c619508659018ff3ef0887611eb8f7
> Rough sequence from the logs seems to be the following:
> ===
> * Received request to open region:
> documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7.
> * Setting up tabledescriptor config now ...
> * Opening of region {NAME =>
> 'documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7.',
>      STARTKEY => '7128586022887322720',
>      ENDKEY => '7130716361635801616',
>      ENCODED => 79c619508659018ff3ef0887611eb8f7,} failed, marking as 
> FAILED_OPEN in ZK
> * File does not exist: 
> /hbase/documents/5b9c16898a371de58f31f0bdf86b1f8b/d/0707b1ec4c6b41cf9174e0d2a1785fe9

> [...]
> ===
> What happened, was that somehow (and that's the question here) the daughters
> region folder contained some left-over reference files were causing the 
> RegionServer to look-up the parent region, which already was deleted.
> original contents of /hbase/documents/79c619508659018ff3ef0887611eb8f7/d:
> ==
> 0707b1ec4c6b41cf9174e0d2a1785fe9.5b9c16898a371de58f31f0bdf86b1f8b
> 47511faae81b4452afd3ca206e28346f.5b9c16898a371de58f31f0bdf86b1f8b
> 4f01ecd052ce464d81e79a62ea227d6b
> 4f01ecd052ce464d81e79a62ea227d6b.5b9c16898a371de58f31f0bdf86b1f8b
> eb7dbb09701d4353be24ca82481c4a7e
> == 
> I attached the full FileNotFound Exception.
> Please let me know if I can provide more information or help otherwise.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message