hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thiruvel Thirumoolan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16169) Make RegionSizeCalculator scalable
Date Tue, 15 Nov 2016 03:10:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665872#comment-15665872

Thiruvel Thirumoolan commented on HBASE-16169:

Our primary intention is to use this API for RegionSizeCalculator and not rely on Master for
ClusterStatus. On our large clusters, ClusterStatus() alone takes 4-5 mins which is significant
for some of the pipelines. And if Master is down/busy, then some of the jobs timeout/fail.
This API helps both the scenarios and is the primary use case. Hence I also included RegionSizeCalculator
changes as part of this patch.

Other possible uses:

1. If there is a lighter version of GetClusterStatus API (i.e without the ServerLoad for each
RS), then custom maintenance tools can be better. In current world ClusterStatus is heavy.
With the new APIs, each API's payload is smaller and distributed. So custom tools can call
getRegionLoad() when needed, it will be more accurate. This helps with large clusters. For
tools that don't need RegionLoad, the lighter version of API is fine enough.
2. Another use case is a tool like RSTop - since we can see selective metrics at RegionLevel
(possibly even deltas between each RPC to the server).

Please let us know your thoughts. Our primary intention is to address the delay in MR jobs
and reduce Master dependency.

> Make RegionSizeCalculator scalable
> ----------------------------------
>                 Key: HBASE-16169
>                 URL: https://issues.apache.org/jira/browse/HBASE-16169
>             Project: HBase
>          Issue Type: Sub-task
>          Components: mapreduce, scaling
>            Reporter: Thiruvel Thirumoolan
>            Assignee: Thiruvel Thirumoolan
>             Fix For: 2.0.0, 1.4.0
>         Attachments: HBASE-16169.master.000.patch, HBASE-16169.master.001.patch, HBASE-16169.master.002.patch,
HBASE-16169.master.003.patch, HBASE-16169.master.004.patch, HBASE-16169.master.005.patch,
> RegionSizeCalculator is needed for better split generation of MR jobs. This requires
RegionLoad which can be obtained via ClusterStatus, i.e. accessing Master. We don't want master
to be in this path.
> The proposal is to add an API to the RegionServer that gets RegionLoad of all regions
hosted on it or those of a table if specified. RegionSizeCalculator can use the latter.

This message was sent by Atlassian JIRA

View raw message