hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Liochon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8338) Latency Resilience; umbrella list of issues that will help us ride over bad disk, bad region, ec2, etc.
Date Tue, 30 Apr 2013 18:26:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645813#comment-13645813

Nicolas Liochon commented on HBASE-8338:

Agreed, but it can't be done without the client application: it was writing something and
writing this something can't be done as the region is not there. For example, we have a callback
during the put process. We call this callback only on success. We could call it on failure,
and let the user decides:
- ignore
- stop (today's behavior)
- replace.

basically the callback would return a boolean to let the process continue or not.

It's quite easy to do, if that's what we want...
> Latency Resilience; umbrella list of issues that will help us ride over bad disk, bad
region, ec2, etc.
> -------------------------------------------------------------------------------------------------------
>                 Key: HBASE-8338
>                 URL: https://issues.apache.org/jira/browse/HBASE-8338
>             Project: HBase
>          Issue Type: Umbrella
>          Components: LatencyResilience
>            Reporter: stack
>            Priority: Critical
> Chatting w/ Elliott, we started listing out items to fix that would help keep hbase latency
approximately constant as disks went bad, were saturated by a neighbour (ec2), etc.
> I must made a new LatencyResilience issue category to tag issues that contribute to this
> I have to go at moment but when I get back I'll start to link in existing issues that
help this project along and I'll file new ones.
> Here is what we chatted about:
> + Multiple WALs effort will help keep write latency roughly constant.
> + Figuring how to get a new read started over dfsclient if current replica read is taking
too long would help keep reads about constant (maybe could exploit the nkeywal hackery messing
w/ replicas order).
> + There is an issue where client can currently pile up on a single region because of
the way we do client queues by regionserver.  This needs fixing.
> The above are few ideas worth further exploration at least.
> Idea is to try and bring down our 95percentiles and to make us more robust in the face
of dying disks, etc.  I see this issue rising to the fore now there has been good progress
on the MTTR project.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message