hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-15058) AssignmentManager should account for unsuccessful split correctly which initially passes quota check
Date Thu, 31 Dec 2015 14:08:49 GMT

     [ https://issues.apache.org/jira/browse/HBASE-15058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Yu updated HBASE-15058:
---------------------------
    Description: 
When region split doesn't pass quota check, we would see exception similar to the following:
{code}
2015-12-29 16:07:33,653 INFO  [RS:0;10.21.128.189:57449-splits-1451434041585] regionserver.SplitRequest(97):
Running rollback/cleanup of failed split of np2:                     testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.;
Failed to get ok from master to split np2:testRegionNormalizationSplitOnCluster,     zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
java.io.IOException: Failed to get ok from master to split np2:testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
  at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:345)
  at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:262)
  at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:502)
  at org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
  at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:155)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
{code}
However, region split may fail for subsequent SplitTransactionPhase's in stepsBeforePONR().
Currently in branch-1, the distinction among the following TransitionCode's is not clear in
AssignmentManager#onRegionTransition():
{code}
    case SPLIT_PONR:
    case SPLIT:
    case SPLIT_REVERTED:
      errorMsg =
          onRegionSplit(serverName, code, hri, HRegionInfo.convert(transition.getRegionInfo(1)),
            HRegionInfo.convert(transition.getRegionInfo(2)));
      if (org.apache.commons.lang.StringUtils.isEmpty(errorMsg)) {
        try {
          regionStateListener.onRegionSplitReverted(hri);
{code}
onRegionSplit() handles the above 3 TransitionCode's. However, errorMsg is normally null (onRegionSplit
returns null at the end).
This would result in onRegionSplitReverted() being called for cases of SPLIT_PONR and SPLIT.

When region split fails, AssignmentManager#onRegionTransition() should account for the failure
properly so that quota bookkeeping is consistent.

  was:
When region split doesn't pass quota check, we would see exception similar to the following:
{code}
2015-12-29 16:07:33,653 INFO  [RS:0;10.21.128.189:57449-splits-1451434041585] regionserver.SplitRequest(97):
Running rollback/cleanup of failed split of np2:                     testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.;
Failed to get ok from master to split np2:testRegionNormalizationSplitOnCluster,     zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
java.io.IOException: Failed to get ok from master to split np2:testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
  at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:345)
  at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:262)
  at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:502)
  at org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
  at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:155)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
{code}
However, region split may fail for subsequent SplitTransactionPhase's in stepsBeforePONR().
Currently in branch-1, the distinction among the following states is not clear in AssignmentManager#onRegionTransition():
{code}
    case SPLIT_PONR:
    case SPLIT:
    case SPLIT_REVERTED:
      errorMsg =
          onRegionSplit(serverName, code, hri, HRegionInfo.convert(transition.getRegionInfo(1)),
            HRegionInfo.convert(transition.getRegionInfo(2)));
      if (org.apache.commons.lang.StringUtils.isEmpty(errorMsg)) {
        try {
          regionStateListener.onRegionSplitReverted(hri);
{code}
onRegionSplit() handles the above 3 TransitionCode's. However, errorMsg is normally null (onRegionSplit
returns null at the end).
This would result in onRegionSplitReverted() being called for cases of SPLIT_PONR and SPLIT.

When region split fails, AssignmentManager#onRegionTransition() should account for the failure
properly so that quota bookkeeping is consistent.


> AssignmentManager should account for unsuccessful split correctly which initially passes
quota check
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-15058
>                 URL: https://issues.apache.org/jira/browse/HBASE-15058
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Ted Yu
>
> When region split doesn't pass quota check, we would see exception similar to the following:
> {code}
> 2015-12-29 16:07:33,653 INFO  [RS:0;10.21.128.189:57449-splits-1451434041585] regionserver.SplitRequest(97):
Running rollback/cleanup of failed split of np2:                     testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.;
Failed to get ok from master to split np2:testRegionNormalizationSplitOnCluster,     zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
> java.io.IOException: Failed to get ok from master to split np2:testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
>   at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:345)
>   at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:262)
>   at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:502)
>   at org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
>   at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:155)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> {code}
> However, region split may fail for subsequent SplitTransactionPhase's in stepsBeforePONR().
> Currently in branch-1, the distinction among the following TransitionCode's is not clear
in AssignmentManager#onRegionTransition():
> {code}
>     case SPLIT_PONR:
>     case SPLIT:
>     case SPLIT_REVERTED:
>       errorMsg =
>           onRegionSplit(serverName, code, hri, HRegionInfo.convert(transition.getRegionInfo(1)),
>             HRegionInfo.convert(transition.getRegionInfo(2)));
>       if (org.apache.commons.lang.StringUtils.isEmpty(errorMsg)) {
>         try {
>           regionStateListener.onRegionSplitReverted(hri);
> {code}
> onRegionSplit() handles the above 3 TransitionCode's. However, errorMsg is normally null
(onRegionSplit returns null at the end).
> This would result in onRegionSplitReverted() being called for cases of SPLIT_PONR and
SPLIT.
> When region split fails, AssignmentManager#onRegionTransition() should account for the
failure properly so that quota bookkeeping is consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message