hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gregory Chanan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover
Date Thu, 20 Sep 2012 05:20:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459356#comment-13459356
] 

Gregory Chanan commented on HBASE-5843:
---------------------------------------

bq. Was reading today that VoltDB does offheap so can have small java heap and just run w/
defaults other than setting -Xmx and -Xms. Could try a few RS per box.. w/ small heap each
(RS needs to be put on a CPU diet but could look at that afterward).

Yes, CPU definitely need a diet.  Probably start with eliminating a bunch of threads.

bq. Elsewhere nkeywal puts up a suggested prescription where on crash we assign and then split
logs (rather than other way round as we do now). Nicolas suggests that the assigned regions
could immediately start taking writes; we'd throw an exception if a read attempt was made
until after the split completed and the regions edits had been replayed: HBASE-6752

Right, I think HBASE-6752 is a great idea, but it doesn't address serving reads more quickly.
 I'm wondering if there is more we can do to address that.
                
> Improve HBase MTTR - Mean Time To Recover
> -----------------------------------------
>
>                 Key: HBASE-5843
>                 URL: https://issues.apache.org/jira/browse/HBASE-5843
>             Project: HBase
>          Issue Type: Umbrella
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>
> A part of the approach is described here: https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit
> The ideal target is:
> - failure impact client applications only by an added delay to execute a query, whatever
the failure.
> - this delay is always inferior to 1 second.
> We're not going to achieve that immediately...
> Priority will be given to the most frequent issues.
> Short term:
> - software crash
> - standard administrative tasks as stop/start of a cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message