hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16464) archive folder grows bigger and bigger due to corrupt snapshot under tmp dir
Date Mon, 22 Aug 2016 10:49:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430544#comment-15430544
] 

Hadoop QA commented on HBASE-16464:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s {color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green}
The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color}
| {color:green} The patch appears to include 2 new or modified test files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 33s {color}
| {color:green} branch-1.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} |
{color:green} branch-1.1 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} |
{color:green} branch-1.1 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color}
| {color:green} branch-1.1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color}
| {color:green} branch-1.1 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 44s {color} | {color:red}
hbase-server in branch-1.1 has 80 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 31s {color} | {color:red}
hbase-server in branch-1.1 failed with JDK v1.8.0_101. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} |
{color:green} branch-1.1 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} |
{color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} |
{color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 31s {color} | {color:red}
The patch causes 11 errors with Hadoop v2.6.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 8m 45s {color} | {color:red}
The patch causes 11 errors with Hadoop v2.6.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 9m 55s {color} | {color:red}
The patch causes 11 errors with Hadoop v2.6.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 11m 5s {color} | {color:red}
The patch causes 11 errors with Hadoop v2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 15s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s {color} |
{color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 28s {color} | {color:red}
hbase-server in the patch failed with JDK v1.8.0_101. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} |
{color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 54s {color} | {color:red}
hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s {color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 119m 58s {color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.util.TestHBaseFsck |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:date2016-08-22 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12824802/HBASE-16464-branch-1.1.patch
|
| JIRA Issue | HBASE-16464 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  hbaseanti  checkstyle
 compile  |
| uname | Linux 9903eae3d7cd 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/hbase.sh |
| git revision | branch-1.1 / 1c92f4c |
| Default Java | 1.7.0_101 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101
|
| findbugs | v3.0.0 |
| findbugs | https://builds.apache.org/job/PreCommit-HBASE-Build/3191/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
|
| javadoc | https://builds.apache.org/job/PreCommit-HBASE-Build/3191/artifact/patchprocess/branch-javadoc-hbase-server-jdk1.8.0_101.txt
|
| javadoc | https://builds.apache.org/job/PreCommit-HBASE-Build/3191/artifact/patchprocess/patch-javadoc-hbase-server-jdk1.8.0_101.txt
|
| unit | https://builds.apache.org/job/PreCommit-HBASE-Build/3191/artifact/patchprocess/patch-unit-hbase-server.txt
|
| unit test logs |  https://builds.apache.org/job/PreCommit-HBASE-Build/3191/artifact/patchprocess/patch-unit-hbase-server.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/3191/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/3191/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> archive folder grows bigger and bigger due to corrupt snapshot under tmp dir
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-16464
>                 URL: https://issues.apache.org/jira/browse/HBASE-16464
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.1
>            Reporter: Heng Chen
>            Assignee: Heng Chen
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 1.2.3
>
>         Attachments: HBASE-16464-branch-1.1.patch, HBASE-16464.patch
>
>
> We met the problem on our real production cluster,  we need to cleanup some data on hbase,
 we notice the archive folder is much larger than others,  so we delete all snapshots of all
tables,  but the archive folder still grows bigger and bigger. 
> After check the hmaster log, we notice the exception below:
> {code}
> 2016-08-22 15:34:33,089 ERROR [f04,16000,1471240833208_ChoreService_1] snapshot.SnapshotHFileCleaner:
Exception while checking if files were valid, keeping them just in case.
> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info
from:hdfs://f04/hbase/.hbase-snapshot/.tmp/frog_stastic_2016-08-17/.snapshotinfo
>         at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:295)
>         at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:328)
>         at org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner$1.filesUnderSnapshot(SnapshotHFileCleaner.java:85)
>         at org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getSnapshotsInProgress(SnapshotFileCache.java:303)
>         at org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getUnreferencedFiles(SnapshotFileCache.java:194)
>         at org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:62)
>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:233)
>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:157)
>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
>         at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:185)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: File does not exist: /hbase/.hbase-snapshot/.tmp/frog_stastic_2016-08-17/.snapshotinfo
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:587)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> {code}
> It means when SnapshotHFileCleaner begin to cleanup the archive folder, it reads the
the snapshot dir to check if any links to hfiles exist, but when read the file /.hbase-snapshot/.tmp/frog_stastic_2016-08-17/.snapshotinfo,
corrupt exception thrown out (not sure why the file not found),  and cleanup will be failed.
> When i check the /.hbase-snapshot/.tmp/frog_stastic_2016-08-17, i can see there is only
one file exist /hbase/.hbase-snapshot/.tmp/frog_stastic_2016-08-17/region-manifest.8e3179c388e10770eba7d35e30f2777f,
 /hbase/.hbase-snapshot/.tmp/frog_stastic_2016-08-17/.snapshotinfo missed. 
> I think we should catch up the exception and delete the file to ensure cleanup will go
on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message