hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gregory Chanan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover
Date Thu, 20 Sep 2012 00:35:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459251#comment-13459251
] 

Gregory Chanan commented on HBASE-5843:
---------------------------------------

Again, great work Nicolas.  Some questions/comments:

bq. It means as well that the detection time dominates the other parameters when everything
goes well.
bq. Detection time will become more an more important.
Interesting.  The sad part is we often find ourselves having to *increase* the ZK timeout
in order to deal with Juliet GC pauses.  Given that detection time dominates, perhaps we should
put some effort into correcting that (multiple RS on a single box?)

bq. Replaying should be looked at (more in terms of lock than raw performances).
Why do you say this with respect to locking?  Is the performance not as good as you would
expect?  Or just haven't looked at it yet?

bq. An improvement would be to reassign the region in parallel of the split
I've wondered why we don't do this.  Do you see any implementation challenges with doing this?
 Maybe I'll look into it.
                
> Improve HBase MTTR - Mean Time To Recover
> -----------------------------------------
>
>                 Key: HBASE-5843
>                 URL: https://issues.apache.org/jira/browse/HBASE-5843
>             Project: HBase
>          Issue Type: Umbrella
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>
> A part of the approach is described here: https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit
> The ideal target is:
> - failure impact client applications only by an added delay to execute a query, whatever
the failure.
> - this delay is always inferior to 1 second.
> We're not going to achieve that immediately...
> Priority will be given to the most frequent issues.
> Short term:
> - software crash
> - standard administrative tasks as stop/start of a cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message