hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Biju Nair (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14215) Default cost used for PrimaryRegionCountSkewCostFunction is not sufficient
Date Wed, 19 Aug 2015 15:11:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703160#comment-14703160
] 

Biju Nair commented on HBASE-14215:
-----------------------------------

Thanks all for the comments. The following is the reasoning for setting the value to 10000
for {{hbase.master.balancer.stochastic.primaryRegionCountCost}} in the context of region replication
> 1 based on my limited understanding.

- High value of 100000 set to {{regionReplicaHostCostKey}} helps with reducing/eliminating
duplicate replication of regions on hosts and in turn improving availability. Elimination
of duplicate regions on hosts also helps with performance with secondary calls being made
to different hosts and hence distributing the query load.
- The function to reduce duplicates of region replicas on the same rack which uses the multiplier
{{regionReplicaRackCostKey}} helps with availability but not as much with the performance
of queries since they get distributed to the servers with no consideration to rack.
- The new function to reduce skews of primary region replicas on servers is to distribute
the primaries uniformly across all the servers which intern distributes query load and improves
performance since by default all queries will get serviced by primary replicas.

While duplicate replicas on servers are eliminated by high cost of 100000 which also helps
with performance, the next criteria was to balance between rack level availability vs request
performance. By setting {{primaryRegionCountCost}} equal to  {{regionReplicaRackCostKey}}
which is 10000 the assumption was that the candidate cluster which will be used will be balanced
for availability and performance. Let me know what was overlooked so it will help with the
understanding.

As suggested will try other cost values and update the ticket.  Currently we are using site.xml
to vary the costs.

> Default cost used for PrimaryRegionCountSkewCostFunction is not sufficient 
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-14215
>                 URL: https://issues.apache.org/jira/browse/HBASE-14215
>             Project: HBase
>          Issue Type: Bug
>          Components: Balancer
>            Reporter: Biju Nair
>            Priority: Minor
>         Attachments: 14215-v1.txt
>
>
> Current multiplier of 500 used in the stochastic balancer cost function {{PrimaryRegionCountSkewCostFunction}}
to calculate the cost of  total primary replication skew doesn't seem to be sufficient to
prevent the skews (Refer HBASE-14110). We would want the default cost to be a higher value
so that skews in primary region replica has higher cost. The following is the test result
by setting the multiplier value to 10000 (same as the region replica rack cost multiplier)
on a 3 Rack 9 RS node cluster which seems to get the balancer distribute the primaries uniformly.
> *Initial Primary replica distribution - using the current multiplier*	
>  r1n10  102
>  r1n11  85
>  r1n9    88
>  r2n10  120
>  r2n11  120
>  r2n9   124
>  r3n10  135
>  r3n11  124
>  r3n9    129
> *After long duration of read & writes - using current multiplier*	
>  r1n10  102
>  r1n11  85
>  r1n9    88
>  r2n10  120
>  r2n11  120
>  r2n9    124
>  r3n10  135
>  r3n11  124
>  r3n9    129
> *After manual balancing* 	
>  r1n10  102
>  r1n11  85
>  r1n9    88
>  r2n10  120
>  r2n11  120
>  r2n9    124
>  r3n10  135
>  r3n11  124
>  r3n9    129
> *Increased multiplier for primaryRegionCountSkewCost to 10000*	
>  r1n10  114
>  r1n11  113
>  r1n9    114
>  r2n10  114
>  r2n11  114
>  r2n9    113
>  r3n10  115
>  r3n11  115
>  r3n9    115 
> Setting the {{PrimaryRegionCountSkewCostFunction}} multiplier value to 10000 should help
HBase general use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message