hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-17963) RegionServers lose file locality on unplanned restart
Date Fri, 28 Apr 2017 12:40:04 GMT

     [ https://issues.apache.org/jira/browse/HBASE-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Elser resolved HBASE-17963.
--------------------------------
    Resolution: Incomplete

[~bjorn.olsen1@gmail.com], I think this is a bit too vague to have any actionable development
efforts attached to it. Discussions about how to fix a problem are best had on the mailing
lists.

You might be interested in trying to tweak the value of {{hbase.master.balancer.stochastic.localityCost}}
to a value like 400 or 500. This will instruct the balancer to make locality a more dominant
factor in balancing your cluster. This would help a completely crashed cluster to get back
to the "most data locality" state.

> RegionServers lose file locality on unplanned restart
> -----------------------------------------------------
>
>                 Key: HBASE-17963
>                 URL: https://issues.apache.org/jira/browse/HBASE-17963
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase
>    Affects Versions: 1.1.2
>         Environment: Evident with HDP 2.4.3 running HBase 1.1.2
>            Reporter: Bjorn Olsen
>
> When an HBase cluster crashes, HFile locality is lost. 
> Crashes can happen for a variety of reasons, and in this event having a quick time to
recover (both data and database performance) is critical. 
> On cluster restore, region servers do not load their previous set of regions, which means
all HFiles must be moved around until locality is achieved again. Performance is poor while
file locality is not close to 100%. 
> A major compaction must be run to move the regions around, which further impacts performance
and will take longer the more data was in HBase at the time of the crash.
> There is a graceful_stop script which is useful for planned outages - you can first unload
the regions from the region server, restart it, and then reload the regions to the same server.
No HFiles need to be moved and file locality is quickly restored.
> However, with an unplanned outage, there is no locality kept of where the regions were.
On a crash HBase randomly assigns regions to region servers and HFile locality is very low.
We then need to move all the HFiles around until file locality is restored.
> This is fine for a small number of regions and small HFiles but becomes problematic when
you have a large number of region servers or large files.
> This JIRA is a request to improve this behavior for unplanned outages by trying to restore
the regions assigned per server, after a cluster restart. 
> For example, HBase could keep a list of the region locality at regular intervals, and
use this as an initial guideline when regions are restarted. Locality might still not be 100%
immediately - but presumably better than 0%. 
> It would be necessary to first disable the load balancer (if enabled) while this restore
is happening and enable it afterward.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message