hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-932) Regionserver restart
Date Thu, 16 Oct 2008 18:03:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640246#action_12640246
] 

stack commented on HBASE-932:
-----------------------------

Yeah, we had a babysitter on our cluster.  His name was 'god'.  He got fired though because
he was forever doing restarts when they weren't wanted and just generally being interfering
and causing trouble.

There's for sure a place for babysitters.  This issue is based on the postulate that sometimes
the regionserver knows more about its state or how it might fix itself than it could even
reveal to an external generic daemon babysitter.  For a few hdfs error types, a pause and
complete restart -- perhaps attempted N times at most -- could set a regionserver aright again.
 Danger would be replication of 'god' behavior.

> Regionserver restart
> --------------------
>
>                 Key: HBASE-932
>                 URL: https://issues.apache.org/jira/browse/HBASE-932
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>
> If we drop a flush or we fail close a write-ahead log, we currently shutdown the regionserver
(we fail because of hdfs usually).  Rather than shut themselves down, how about they restart?
 The restart at least in the HBASE-930 might fix the issue shaking DFSClient so it gets sense
again.  Even is HDFS is bad, it'll come around eventually.  The HRS restarting itself plus
HBASE-926 fix will make for fast recovery.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message