hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sudharsan Sampath (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3331) Kill -STOP of RS hosting META does not recover
Date Thu, 25 Aug 2011 11:19:29 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090942#comment-13090942

Sudharsan Sampath commented on HBASE-3331:

Its more related to the META region only. Followimg debug info is printed before throwing

2011-08-25 12:46:52,443 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
locateRegionInMeta parentTable=-ROOT-, metaLocation=address: sb6270x1664:60020, regioninfo:
-ROOT-,,0.70236052, attempt=0 of 10 failed; retrying after sleep of 1000 because: Connection

> Kill -STOP of RS hosting META does not recover
> ----------------------------------------------
>                 Key: HBASE-3331
>                 URL: https://issues.apache.org/jira/browse/HBASE-3331
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>         Attachments: timeouts.log.txt
> If you find the server hosting META and kill -STOP its region server, it will eventually
lose its ZK session and the master will split its logs and try to reassign. However, at some
point along here it tries to access the old META, and gets SocketTimeoutExceptions, which
cause it to keep retrying forever. Once I kill -9ed the stopped server, things came back to

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message