hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-21095) The timeout retry logic for several procedures are broken after master restarts
Date Sun, 26 Aug 2018 17:41:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-21095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592977#comment-16592977

Hudson commented on HBASE-21095:

Results for branch branch-2
	[build #1167 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1167/]:
(x) *{color:red}-1 overall{color}*
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1167//General_Nightly_Build_Report/]

(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1167//JDK8_Nightly_Build_Report_(Hadoop2)/]

(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1167//JDK8_Nightly_Build_Report_(Hadoop3)/]

(/) {color:green}+1 source release artifact{color}
-- See build output for details.

(/) {color:green}+1 client integration test{color}

> The timeout retry logic for several procedures are broken after master restarts
> -------------------------------------------------------------------------------
>                 Key: HBASE-21095
>                 URL: https://issues.apache.org/jira/browse/HBASE-21095
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2, proc-v2
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Critical
>             Fix For: 3.0.0, 2.2.0
>         Attachments: HBASE-21095-branch-2.0.patch, HBASE-21095-v1.patch, HBASE-21095-v2.patch,
HBASE-21095.branch-2.0.001.patch, HBASE-21095.patch
> For TRSP, and also RTP in branch-2.0 and branch-2.1, if we fail to assign or unassign
a region, we will set the procedure to WAITING_TIMEOUT state, and rely on the ProcedureEvent
in RegionStateNode to wake us up later. But after restarting, we do not suspend the ProcedureEvent
in RSN, and also do not add the procedure to the ProcedureEvent's suspending queue, so we
will hang there forever as no one will wake us up.

This message was sent by Atlassian JIRA

View raw message