Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9B816200CB2 for ; Sun, 11 Jun 2017 02:17:25 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9A1D7160BE9; Sun, 11 Jun 2017 00:17:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B7AA1160BDA for ; Sun, 11 Jun 2017 02:17:24 +0200 (CEST) Received: (qmail 902 invoked by uid 500); 11 Jun 2017 00:17:23 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 891 invoked by uid 99); 11 Jun 2017 00:17:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Jun 2017 00:17:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 7617B1A028D for ; Sun, 11 Jun 2017 00:17:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Z-OtVVxDuBOr for ; Sun, 11 Jun 2017 00:17:21 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 16D435FCD7 for ; Sun, 11 Jun 2017 00:17:21 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id EC6A1E0DF2 for ; Sun, 11 Jun 2017 00:17:19 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 917F121E29 for ; Sun, 11 Jun 2017 00:17:18 +0000 (UTC) Date: Sun, 11 Jun 2017 00:17:18 +0000 (UTC) From: "Nick Dimiduk (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-14223) Meta WALs are not cleared if meta region was closed and RS aborts MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 11 Jun 2017 00:17:25 -0000 [ https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-14223: --------------------------------- Fix Version/s: (was: 1.1.11) 1.1.12 > Meta WALs are not cleared if meta region was closed and RS aborts > ----------------------------------------------------------------- > > Key: HBASE-14223 > URL: https://issues.apache.org/jira/browse/HBASE-14223 > Project: HBase > Issue Type: Bug > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Fix For: 2.0.0, 1.4.0, 1.3.2, 1.2.7, 1.1.12 > > Attachments: HBASE-14223logs, hbase-14223_v0.patch, hbase-14223_v1-branch-1.patch, hbase-14223_v2-branch-1.patch, hbase-14223_v3-branch-1.patch, hbase-14223_v3-branch-1.patch, hbase-14223_v3-master.patch > > > When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. The last WAL file just sits there in the RS WAL directory. If RS stops gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for meta is not cleaned. It is also not split (which is correct) since master determines that the RS no longer hosts meta at the time of RS abort. > From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} directories left uncleaned: > {code} > [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls /apps/hbase/data/WALs > Found 31 items > drwxr-xr-x - hbase hadoop 0 2015-06-05 01:14 /apps/hbase/data/WALs/hregion-58203265 > drwxr-xr-x - hbase hadoop 0 2015-06-05 07:54 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting > drwxr-xr-x - hbase hadoop 0 2015-06-05 09:28 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting > drwxr-xr-x - hbase hadoop 0 2015-06-05 10:01 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting > ... > {code} > The directories contain WALs from meta: > {code} > [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting > Found 2 items > -rw-r--r-- 3 hbase hadoop 201608 2015-06-05 03:15 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta > -rw-r--r-- 3 hbase hadoop 44420 2015-06-05 04:36 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta > {code} > The RS hosted the meta region for some time: > {code} > 2015-06-05 03:14:28,692 INFO [PostOpenDeployTasks:1588230740] zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285 > ... > 2015-06-05 03:15:17,302 INFO [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed hbase:meta,,1.1588230740 > {code} > In between, a WAL is created: > {code} > 2015-06-05 03:15:11,707 INFO [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog: Rolled WAL /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta with entries=385, filesize=196.88 KB; new WAL /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta > {code} > When CM killed the region server later master did not see these WAL files: > {code} > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075 INFO [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] master.SplitLogManager: started splitting 2 logs in [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting] for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285] > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300 INFO [main-EventThread] wal.WALSplitter: Archived processed log hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436 to hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436 > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,497 INFO [main-EventThread] wal.WALSplitter: Archived processed log hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475175329 to hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475175329 > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,507 WARN [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] master.SplitLogManager: returning success without actually splitting and deleting all the log files in path hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,508 INFO [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] master.SplitLogManager: finished splitting (more than or equal to) 129135000 bytes in 2 log files in [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting] in 4433ms > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)