Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 53476200D04 for ; Mon, 11 Sep 2017 18:13:16 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 520861609C4; Mon, 11 Sep 2017 16:13:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 46D0D1609C3 for ; Mon, 11 Sep 2017 18:13:15 +0200 (CEST) Received: (qmail 35335 invoked by uid 500); 11 Sep 2017 16:13:14 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 35324 invoked by uid 99); 11 Sep 2017 16:13:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Sep 2017 16:13:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B6B7518A89A for ; Mon, 11 Sep 2017 16:13:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id iawVZAqg4Y3Q for ; Mon, 11 Sep 2017 16:13:11 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id BD89E61266 for ; Mon, 11 Sep 2017 16:13:08 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 44460E0EAD for ; Mon, 11 Sep 2017 16:13:08 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id D736B24170 for ; Mon, 11 Sep 2017 16:13:02 +0000 (UTC) Date: Mon, 11 Sep 2017 16:13:02 +0000 (UTC) From: "Hadoop QA (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 11 Sep 2017 16:13:16 -0000 [ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161492#comment-16161492 ] Hadoop QA commented on HBASE-18771: ----------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 13s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_151 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_151 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 17m 55s{color} | {color:green} The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}393m 13s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 3m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}431m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.client.TestAdmin2 | | | hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort | | | hadoop.hbase.snapshot.TestRegionSnapshotTask | | | hadoop.hbase.client.TestAdmin1 | | Timed out junit tests | org.apache.hadoop.hbase.client.TestReplicasClient | | | org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS | | | org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat | | | org.apache.hadoop.hbase.regionserver.TestCorruptedRegionStoreFile | | | org.apache.hadoop.hbase.regionserver.compactions.TestFIFOCompactionPolicy | | | org.apache.hadoop.hbase.replication.TestReplicationChangingPeerRegionservers | | | org.apache.hadoop.hbase.master.TestMasterFailover | | | org.apache.hadoop.hbase.mapred.TestTableInputFormat | | | org.apache.hadoop.hbase.filter.TestFilterWrapper | | | org.apache.hadoop.hbase.mapred.TestTableMapReduceUtil | | | org.apache.hadoop.hbase.master.TestGetInfoPort | | | org.apache.hadoop.hbase.coprocessor.TestMasterObserver | | | org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildHole | | | org.apache.hadoop.hbase.client.TestMetaWithReplicas | | | org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort | | | org.apache.hadoop.hbase.util.TestHBaseFsckEncryption | | | org.apache.hadoop.hbase.client.TestFromClientSide3 | | | org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithCustomVisLabService | | | org.apache.hadoop.hbase.master.TestTableLockManager | | | org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS | | | org.apache.hadoop.hbase.TestClusterBootOrder | | | org.apache.hadoop.hbase.client.TestScannerTimeout | | | org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildOverlap | | | org.apache.hadoop.hbase.TestJMXConnectorServer | | | org.apache.hadoop.hbase.util.TestMiniClusterLoadEncoded | | | org.apache.hadoop.hbase.client.TestSplitOrMergeStatus | | | org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential | | | org.apache.hadoop.hbase.regionserver.wal.TestLogRolling | | | org.apache.hadoop.hbase.regionserver.throttle.TestCompactionWithThroughputController | | | org.apache.hadoop.hbase.client.TestIncrementFromClientSideWithCoprocessor | | | org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush | | | org.apache.hadoop.hbase.coprocessor.TestCoprocessorTableEndpoint | | | org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot | | | org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint | | | org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster | | | org.apache.hadoop.hbase.TestAcidGuarantees | | | org.apache.hadoop.hbase.replication.TestReplicationEndpoint | | | org.apache.hadoop.hbase.TestGlobalMemStoreSize | | | org.apache.hadoop.hbase.replication.TestReplicationSyncUpTool | | | org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor | | | org.apache.hadoop.hbase.snapshot.TestExportSnapshot | | | org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion | | | org.apache.hadoop.hbase.replication.multiwal.TestReplicationEndpointWithMultipleWAL | | | org.apache.hadoop.hbase.replication.TestReplicationSyncUpToolWithBulkLoadedData | | | org.apache.hadoop.hbase.TestInfoServers | | | org.apache.hadoop.hbase.client.replication.TestReplicationAdminWithTwoDifferentZKClusters | | | org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase | | | org.apache.hadoop.hbase.replication.TestReplicationWithTags | | | org.apache.hadoop.hbase.wal.TestWALOpenAfterDNRollingStart | | | org.apache.hadoop.hbase.master.TestRestartCluster | | | org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel | | | org.apache.hadoop.hbase.master.procedure.TestMasterProcedureEvents | | | org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSCompressed | | | org.apache.hadoop.hbase.TestHColumnDescriptorDefaultVersions | | | org.apache.hadoop.hbase.client.TestSmallReversedScanner | | | org.apache.hadoop.hbase.client.TestTableSnapshotScanner | | | org.apache.hadoop.hbase.fs.TestBlockReorder | | | org.apache.hadoop.hbase.TestServerSideScanMetricsFromClientSide | | | org.apache.hadoop.hbase.snapshot.TestFlushSnapshotFromClient | | | org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient | | | org.apache.hadoop.hbase.client.TestMultiParallel | | | org.apache.hadoop.hbase.wal.TestDefaultWALProvider | | | org.apache.hadoop.hbase.TestMetaTableAccessor | | | org.apache.hadoop.hbase.client.replication.TestReplicationAdminWithClusters | | | org.apache.hadoop.hbase.replication.TestPerTableCFReplication | | | org.apache.hadoop.hbase.replication.TestMultiSlaveReplication | | | org.apache.hadoop.hbase.master.handler.TestEnableTableHandler | | | org.apache.hadoop.hbase.master.handler.TestCreateTableHandler | | | org.apache.hadoop.hbase.regionserver.TestRegionReplicaFailover | | | org.apache.hadoop.hbase.wal.TestWALFactory | | | org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan2 | | | org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1 | | | org.apache.hadoop.hbase.mapreduce.TestTableSnapshotInputFormat | | | org.apache.hadoop.hbase.replication.TestMasterReplication | | | org.apache.hadoop.hbase.TestPartialResultsFromClientSide | | | org.apache.hadoop.hbase.client.TestSnapshotFromClient | | | org.apache.hadoop.hbase.replication.TestReplicationSource | | | org.apache.hadoop.hbase.io.TestFileLink | | | org.apache.hadoop.hbase.mapred.TestTableSnapshotInputFormat | | | org.apache.hadoop.hbase.tool.TestCanaryTool | | | org.apache.hadoop.hbase.util.TestFSUtils | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:b3a2a00 | | JIRA Issue | HBASE-18771 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12886373/HBASE-18771.branch-1.3.004.patch | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 89b65b085cab 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/hbase.sh | | git revision | branch-1.3 / ae6ff50 | | Default Java | 1.7.0_151 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_144 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_151 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/8556/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/8556/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/8556/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > Incorrect StoreFileRefresh leading to split and compaction failures > ------------------------------------------------------------------- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug > Affects Versions: 1.3.1 > Reporter: Abhishek Singh Chouhan > Assignee: Abhishek Singh Chouhan > Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, HBASE-18771.branch-1.3.004.patch, HBASE-18771.master.001.patch, HBASE-18771.master.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this point we now have 5 store files, however only 1(the newly formed) is open now for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) being called which results in region.refreshStoreFiles(true) -> HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously compacted files back to the store, however these files are also present in StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] regionserver.CompactSplitThread - Compaction selection failed regionName = xxxx, storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs://xxxx > at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening daughter regions due to FNFE. This results in parent offline and daughters also in a limbo since they're unable to open. Since we get the error after PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)