hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-20038) TestLockProcedure.testTimeout is flakey
Date Wed, 21 Feb 2018 23:41:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-20038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372220#comment-16372220
] 

Hadoop QA commented on HBASE-20038:
-----------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 15s{color} | {color:blue}
Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  1s{color} | {color:blue}
Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  0s{color}
| {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 1 new or modified test files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m  7s{color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 52s{color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 14s{color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 46s{color}
| {color:green} branch has no errors when building our shaded downstream artifacts. {color}
|
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 39s{color} |
{color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 34s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 50s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 50s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 12s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 16s{color}
| {color:green} patch has no errors when building our shaded downstream artifacts. {color}
|
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 21m 15s{color}
| {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color}
|
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 35s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}115m 48s{color} | {color:green}
hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 20s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 26s{color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20038 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911423/HBASE-20038.patch
|
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  hadoopcheck  hbaseanti
 checkstyle  compile  |
| uname | Linux dad347901de9 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018
x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
|
| git revision | master / 5d994a24fc |
| maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z)
|
| Default Java | 1.8.0_151 |
|  Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11604/testReport/ |
| Max. process+thread count | 5346 (vs. ulimit of 10000) |
| modules | C: hbase-server U: hbase-server |
| Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11604/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> TestLockProcedure.testTimeout is flakey
> ---------------------------------------
>
>                 Key: HBASE-20038
>                 URL: https://issues.apache.org/jira/browse/HBASE-20038
>             Project: HBase
>          Issue Type: Sub-task
>          Components: proc-v2, test
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>             Fix For: 2.0.0-beta-2
>
>         Attachments: HBASE-20038.patch, HBASE-20038.patch, HBASE-20038.patch
>
>
> The test is simple so it is easy to find out the problem.
> {noformat}
> 2018-02-21 04:53:32,230 INFO  [Time-limited test] hbase.ResourceChecker(148): before:
master.locking.TestLockProcedure#testTimeout Thread=218, OpenFileDescriptor=853, MaxFileDescriptor=60000,
SystemLoadAverage=5075, ProcessCount=312, AvailableMemoryMB=5373
> 2018-02-21 04:53:32,234 WARN  [Time-limited test] procedure2.ProcedureTestingUtility(146):
Set Kill before store update to: false
> 2018-02-21 04:53:32,278 DEBUG [Time-limited test] procedure2.ProcedureExecutor(866):
Stored pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
type=EXCLUSIVE
> 2018-02-21 04:53:32,285 DEBUG [Time-limited test] locking.LockProcedure(195): Heartbeat
pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
type=EXCLUSIVE
> 2018-02-21 04:53:32,286 DEBUG [PEWorker-1] locking.LockProcedure(312): LOCKED pid=14,
state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
type=EXCLUSIVE
> 2018-02-21 04:53:32,303 DEBUG [Time-limited test] locking.TestLockProcedure(204): Proc
id 14 acquired lock.
> 2018-02-21 04:53:32,298 INFO  [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED
pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
type=EXCLUSIVE; timeout=2000, timestamp=1519188814298
> 2018-02-21 04:53:33,303 DEBUG [Time-limited test] locking.LockProcedure(195): Heartbeat
pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
type=EXCLUSIVE
> 2018-02-21 04:53:33,304 DEBUG [Time-limited test] locking.TestLockProcedure(225): Proc
id 14 : LOCKED.
> 2018-02-21 04:53:34,299 DEBUG [ProcExecTimeout] locking.LockProcedure(207): Timeout failure
ProcedureEvent for pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure,
namespace=namespace, type=EXCLUSIVE, ready=false, [pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure,
namespace=namespace, type=EXCLUSIVE]
> 2018-02-21 04:53:34,299 DEBUG [ProcExecTimeout] locking.LockProcedure(210): Calling wake
on ProcedureEvent for pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure,
namespace=namespace, type=EXCLUSIVE, ready=false, [pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure,
namespace=namespace, type=EXCLUSIVE]
> 2018-02-21 04:53:34,299 INFO  [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED
pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
type=EXCLUSIVE; timeout=2000, timestamp=1519188816299
> 2018-02-21 04:53:34,306 DEBUG [Time-limited test] locking.LockProcedure(195): Heartbeat
pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
type=EXCLUSIVE
> 2018-02-21 04:53:34,306 DEBUG [Time-limited test] locking.TestLockProcedure(225): Proc
id 14 : LOCKED.
> 2018-02-21 04:53:34,766 WARN  [HBase-Metrics2-1] impl.MetricsConfig(125): Cannot locate
configuration: tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 2018-02-21 04:53:36,299 DEBUG [ProcExecTimeout] locking.LockProcedure(207): Timeout failure
ProcedureEvent for pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure,
namespace=namespace, type=EXCLUSIVE, ready=false, [pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure,
namespace=namespace, type=EXCLUSIVE]
> 2018-02-21 04:53:36,299 DEBUG [ProcExecTimeout] locking.LockProcedure(210): Calling wake
on ProcedureEvent for pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure,
namespace=namespace, type=EXCLUSIVE, ready=false, [pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure,
namespace=namespace, type=EXCLUSIVE]
> 2018-02-21 04:53:36,299 INFO  [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED
pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
type=EXCLUSIVE; timeout=2000, timestamp=1519188818299
> 2018-02-21 04:53:38,299 DEBUG [ProcExecTimeout] locking.LockProcedure(207): Timeout failure
ProcedureEvent for pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure,
namespace=namespace, type=EXCLUSIVE, ready=false, [pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure,
namespace=namespace, type=EXCLUSIVE]
> 2018-02-21 04:53:38,299 DEBUG [ProcExecTimeout] locking.LockProcedure(210): Calling wake
on ProcedureEvent for pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure,
namespace=namespace, type=EXCLUSIVE, ready=false, [pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure,
namespace=namespace, type=EXCLUSIVE]
> 2018-02-21 04:53:38,310 DEBUG [Time-limited test] locking.LockProcedure(195): Heartbeat
pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
type=EXCLUSIVE
> 2018-02-21 04:53:38,310 INFO  [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED
pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
type=EXCLUSIVE; timeout=2000, timestamp=1519188820310
> 2018-02-21 04:53:38,311 WARN  [Time-limited test] procedure2.ProcedureTestingUtility(146):
Set Kill before store update to: false
> 2018-02-21 04:53:38,342 DEBUG [PEWorker-1] locking.LockProcedure(242): UNLOCKED pid=14,
state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
type=EXCLUSIVE
> 2018-02-21 04:53:38,366 INFO  [PEWorker-1] procedure2.ProcedureExecutor(1247): Finished
pid=14, state=SUCCESS; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
type=EXCLUSIVE in 6.0920sec
> 2018-02-21 04:53:38,628 INFO  [Time-limited test] hbase.ResourceChecker(172): after:
master.locking.TestLockProcedure#testTimeout Thread=209 (was 218), OpenFileDescriptor=851
(was 853), MaxFileDescriptor=60000 (was 60000), SystemLoadAverage=5045 (was 5075), ProcessCount=314
(was 312) - ProcessCount LEAK? -, AvailableMemoryMB=5071 (was 5373)
> {noformat}
> The problem is that, sleep 2 * HEARTBEAT_TIMEOUT is not safe. It is possible that after
the first TIMEOUT the procedure has not been timeout yet and then the check for unlocked and
the actual timeout processing will arrive at almost the same time. You can see the above log,
we are going to mark the procedure as timeout at 04:53:38,299 and the heartbeat for checking
unlock arrives at 04:53:38,310.
> I think increase the wait time can solve the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message