Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 87CE81792A for ; Wed, 2 Dec 2015 07:40:14 +0000 (UTC) Received: (qmail 22525 invoked by uid 500); 2 Dec 2015 07:40:11 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 22467 invoked by uid 500); 2 Dec 2015 07:40:11 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 22453 invoked by uid 99); 2 Dec 2015 07:40:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Dec 2015 07:40:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 07D8B2C1F6B for ; Wed, 2 Dec 2015 07:40:11 +0000 (UTC) Date: Wed, 2 Dec 2015 07:40:11 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-14223) Meta WALs are not cleared if meta region was closed and RS aborts MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035415#comment-15035415 ] Hudson commented on HBASE-14223: -------------------------------- FAILURE: Integrated in HBase-1.2 #417 (See [https://builds.apache.org/job/HBase-1.2/417/]) Revert "HBASE-14223 Meta WALs are not cleared if meta region was closed (stack: rev 2566b6d4891fcb3666351154cabfd60c9985071e) * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALFactory.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java * hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/factories/StressAssignmentManagerMonkeyFactory.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * hbase-server/src/test/java/org/apache/hadoop/hbase/MockRegionServerServices.java * hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/factories/NoKillMonkeyFactory.java * hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/factories/SlowDeterministicMonkeyFactory.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestMetaWALsAreClosed.java * hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/MoveMetaAction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseMetaHandler.java > Meta WALs are not cleared if meta region was closed and RS aborts > ----------------------------------------------------------------- > > Key: HBASE-14223 > URL: https://issues.apache.org/jira/browse/HBASE-14223 > Project: HBase > Issue Type: Bug > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4, 1.0.4 > > Attachments: HBASE-14223logs, hbase-14223_v0.patch, hbase-14223_v1-branch-1.patch, hbase-14223_v2-branch-1.patch, hbase-14223_v3-branch-1.patch, hbase-14223_v3-branch-1.patch, hbase-14223_v3-master.patch > > > When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. The last WAL file just sits there in the RS WAL directory. If RS stops gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for meta is not cleaned. It is also not split (which is correct) since master determines that the RS no longer hosts meta at the time of RS abort. > From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} directories left uncleaned: > {code} > [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls /apps/hbase/data/WALs > Found 31 items > drwxr-xr-x - hbase hadoop 0 2015-06-05 01:14 /apps/hbase/data/WALs/hregion-58203265 > drwxr-xr-x - hbase hadoop 0 2015-06-05 07:54 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting > drwxr-xr-x - hbase hadoop 0 2015-06-05 09:28 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting > drwxr-xr-x - hbase hadoop 0 2015-06-05 10:01 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting > ... > {code} > The directories contain WALs from meta: > {code} > [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting > Found 2 items > -rw-r--r-- 3 hbase hadoop 201608 2015-06-05 03:15 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta > -rw-r--r-- 3 hbase hadoop 44420 2015-06-05 04:36 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta > {code} > The RS hosted the meta region for some time: > {code} > 2015-06-05 03:14:28,692 INFO [PostOpenDeployTasks:1588230740] zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285 > ... > 2015-06-05 03:15:17,302 INFO [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed hbase:meta,,1.1588230740 > {code} > In between, a WAL is created: > {code} > 2015-06-05 03:15:11,707 INFO [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog: Rolled WAL /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta with entries=385, filesize=196.88 KB; new WAL /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta > {code} > When CM killed the region server later master did not see these WAL files: > {code} > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075 INFO [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] master.SplitLogManager: started splitting 2 logs in [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting] for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285] > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300 INFO [main-EventThread] wal.WALSplitter: Archived processed log hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436 to hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436 > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,497 INFO [main-EventThread] wal.WALSplitter: Archived processed log hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475175329 to hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475175329 > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,507 WARN [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] master.SplitLogManager: returning success without actually splitting and deleting all the log files in path hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,508 INFO [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] master.SplitLogManager: finished splitting (more than or equal to) 129135000 bytes in 2 log files in [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting] in 4433ms > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)