hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16721) Concurrency issue in WAL unflushed seqId tracking
Date Thu, 29 Sep 2016 04:28:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531747#comment-15531747
] 

Hadoop QA commented on HBASE-16721:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s {color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green}
The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color}
| {color:green} The patch appears to include 2 new or modified test files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 10s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 45s {color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 25m 11s {color}
| {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2
2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 12s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} |
{color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 90m 11s {color} | {color:red}
hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 128m 23s {color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | org.apache.hadoop.hbase.client.TestReplicasClient |
|   | org.apache.hadoop.hbase.client.TestFromClientSide |
|   | org.apache.hadoop.hbase.client.TestTableSnapshotScanner |
|   | org.apache.hadoop.hbase.client.TestMobCloneSnapshotFromClient |
|   | org.apache.hadoop.hbase.client.TestMobSnapshotCloneIndependence |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12830834/hbase-16721_v2.master.patch
|
| JIRA Issue | HBASE-16721 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  hbaseanti  checkstyle
 compile  |
| uname | Linux da73c8acacde 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
|
| git revision | master / 09a31bd |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | https://builds.apache.org/job/PreCommit-HBASE-Build/3766/artifact/patchprocess/patch-unit-hbase-server.txt
|
| unit test logs |  https://builds.apache.org/job/PreCommit-HBASE-Build/3766/artifact/patchprocess/patch-unit-hbase-server.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/3766/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/3766/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Concurrency issue in WAL unflushed seqId tracking
> -------------------------------------------------
>
>                 Key: HBASE-16721
>                 URL: https://issues.apache.org/jira/browse/HBASE-16721
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 1.0.0, 1.1.0, 1.2.0
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>            Priority: Critical
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4
>
>         Attachments: hbase-16721_v1.branch-1.patch, hbase-16721_v2.branch-1.patch, hbase-16721_v2.master.patch
>
>
> I'm inspecting an interesting case where in a production cluster, some regionservers
ends up accumulating hundreds of WAL files, even with force flushes going on due to max logs.
This happened multiple times on the cluster, but not on other clusters. The cluster has periodic
memstore flusher disabled, however, this still does not explain why the force flush of regions
due to max limit is not working. I think the periodic memstore flusher just masks the underlying
problem, which is why we do not see this in other clusters. 
> The problem starts like this: 
> {code}
> 2016-09-21 17:49:18,272 INFO  [regionserver//10.2.0.55:16020.logRoller] wal.FSHLog: Too
many wals: logs=33, maxlogs=32; forcing flush of 1 regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f
> 2016-09-21 17:49:18,273 WARN  [regionserver//10.2.0.55:16020.logRoller] regionserver.LogRoller:
Failed to schedule flush of d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null
> {code}
> then, it continues until the RS is restarted: 
> {code}
> 2016-09-23 17:43:49,356 INFO  [regionserver//10.2.0.55:16020.logRoller] wal.FSHLog: Too
many wals: logs=721, maxlogs=32; forcing flush of 1 regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f
> 2016-09-23 17:43:49,357 WARN  [regionserver//10.2.0.55:16020.logRoller] regionserver.LogRoller:
Failed to schedule flush of d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null
> {code}
> The problem is that region {{d4cf39dc40ea79f5da4d0cf66d03cb1f}} is already split some
time ago, and was able to flush its data and split without any problems. However, the FSHLog
still thinks that there is some unflushed data for this region. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message