hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allan Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-20990) One operation in procedure batch throws an exception will cause all RegionTransitionProcedures receive the same exception
Date Wed, 01 Aug 2018 04:20:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-20990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564712#comment-16564712

Allan Yang commented on HBASE-20990:

I prefer not returning anything when calling executeProcedure, instead, using reportRegionTransition
and reportProcedureResult to send back the response...
Then you need to record the exceptions in the memory and send them back to master when reporting.
The sync RPC call become a async one, what if the RS restarts before sending this info. The
procedure in master even don't know whether the open/close procedure is executing, whether
a RPC retry is needed.

> One operation in procedure batch throws an exception will cause all RegionTransitionProcedures
receive the same exception
> -------------------------------------------------------------------------------------------------------------------------
>                 Key: HBASE-20990
>                 URL: https://issues.apache.org/jira/browse/HBASE-20990
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2
>    Affects Versions: 2.1.0, 2.0.1
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
> In AMv2, we batch open/close region operations and call RS with executeProcedures API.
But, in this API, if one of the region's operations throws an exception, all the operations
in the batch will receive the same exception. Actually, some of the operations in the batch
is executing normally in the RS.
> I think we should try catch exceptions respectively, and call remoteCallFailed or remoteCallCompleted
in RegionTransitionProcedure respectively. 
> Otherwise, there will be some very strange behave. Such as this one:
> {code}
> 2018-07-18 02:56:18,506 WARN  [RSProcedureDispatcher-pool3-t1] assignment.RegionTransitionProcedure(226):
Remote call failed e010125048016.bja,60020,1531848989401; pid=8362, ppid=8272, state=RUNNABLE:R
> EGION_TRANSITION_DISPATCH; AssignProcedure table=IntegrationTestBigLinkedList, region=0beb8ea4e2f239fc082be7cefede1427,
target=e010125048016.bja,60020,1531848989401; rit=OPENING, location=e010125048016
> .bja,60020,1531848989401; exception=NotServingRegionException
> {code}
> The AssignProcedure failed with a NotServingRegionException, what??? It is very strange,
actually, the AssignProcedure successes on the RS, another CloseRegion operation failed in
the operation batch was causing the exception.
> To correct this, we need to modify the response of executeProcedures API, which is the
ExecuteProceduresResponse proto, to return infos(status, exceptions) per operation.
> This issue alone won't cause much trouble, so not so hurry to change the behave here,
but indeed we need to consider this one when we want do some reconstruct to AMv2.

This message was sent by Atlassian JIRA

View raw message