Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Date: Thu, 23 Apr 2015 16:10:38 +0000 (UTC)
From: "Sean Busbey (JIRA)" <jira@apache.org>
To: dev@hbase.apache.org
Message-ID: <JIRA.12823456.1429805406000.7850.1429805438630@Atlassian.JIRA>
In-Reply-To: <JIRA.12823456.1429805406000@Atlassian.JIRA>
References: <JIRA.12823456.1429805406000@Atlassian.JIRA>
 <JIRA.12823456.1429805406393@arcas>
Subject: [jira] [Created] (HBASE-13539) Clean up empty WAL directories
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Sean Busbey created HBASE-13539:
-----------------------------------

             Summary: Clean up empty WAL directories
                 Key: HBASE-13539
                 URL: https://issues.apache.org/jira/browse/HBASE-13539
             Project: HBase
          Issue Type: Bug
          Components: wal
    Affects Versions: 1.0.0
            Reporter: Sean Busbey
            Priority: Minor


On HMaster startup, we look for wal directories that can indicate the need for recovery. IF there are files in the wal directories, we go through the whole recovery process and eventually delete the directory. However, if the directory is empty we skip over it as a non-error condition.

I think the intention for hte empty ones is we could just reuse them. Unfortunately, since our wal directories include a server-start timestamp we don't reuse them and instead keep around a bunch of old directories.

ex, this server is only running 1 RS. It has been through some issues.

{code}
[busbey@edge ~]$ sudo -u hdfs hdfs dfs -ls -d /hbase/WALs/rack03server22.hbase.example.com*
drwxrwxrwx   - hbase hbase          0 2015-04-04 20:16 /hbase/WALs/rack03server22.hbase.example.com,22101,1428202830692
drwxrwxrwx   - hbase hbase          0 2015-04-05 02:54 /hbase/WALs/rack03server22.hbase.example.com,22101,1428204146406
drwxr-xr-x   - hbase hbase          0 2015-04-06 14:20 /hbase/WALs/rack03server22.hbase.example.com,22101,1428227900589
drwxr-xr-x   - hbase hbase          0 2015-04-07 13:17 /hbase/WALs/rack03server22.hbase.example.com,22101,1428355397531
drwxr-xr-x   - hbase hbase          0 2015-04-08 10:12 /hbase/WALs/rack03server22.hbase.example.com,22101,1428438216546
drwxr-xr-x   - hbase hbase          0 2015-04-08 12:30 /hbase/WALs/rack03server22.hbase.example.com,22101,1428513527999
drwxr-xr-x   - hbase hbase          0 2015-04-10 07:40 /hbase/WALs/rack03server22.hbase.example.com,22101,1428521782656
drwxr-xr-x   - hbase hbase          0 2015-04-10 08:23 /hbase/WALs/rack03server22.hbase.example.com,22101,1428677010976
drwxr-xr-x   - hbase hbase          0 2015-04-10 08:53 /hbase/WALs/rack03server22.hbase.example.com,22101,1428679573094
drwxr-xr-x   - hbase hbase          0 2015-04-13 10:26 /hbase/WALs/rack03server22.hbase.example.com,22101,1428681379039
drwxr-xr-x   - hbase hbase          0 2015-04-19 15:28 /hbase/WALs/rack03server22.hbase.example.com,22101,1428946164686
drwxr-xr-x   - hbase hbase          0 2015-04-19 15:36 /hbase/WALs/rack03server22.hbase.example.com,22101,1429482692579
drwxr-xr-x   - hbase hbase          0 2015-04-21 15:43 /hbase/WALs/rack03server22.hbase.example.com,22101,1429652628679-splitting
drwxr-xr-x   - hbase hbase          0 2015-04-22 07:14 /hbase/WALs/rack03server22.hbase.example.com,22101,1429665239905
drwxr-xr-x   - hbase hbase          0 2015-04-22 08:04 /hbase/WALs/rack03server22.hbase.example.com,22101,1429714674479
drwxr-xr-x   - hbase hbase          0 2015-04-22 08:37 /hbase/WALs/rack03server22.hbase.example.com,22101,1429715217130
drwxr-xr-x   - hbase hbase          0 2015-04-22 10:28 /hbase/WALs/rack03server22.hbase.example.com,22101,1429717221567
drwxr-xr-x   - hbase hbase          0 2015-04-22 11:14 /hbase/WALs/rack03server22.hbase.example.com,22101,1429723761988
drwxr-xr-x   - hbase hbase          0 2015-04-23 08:17 /hbase/WALs/rack03server22.hbase.example.com,22101,1429726649267
[busbey@edge ~]$ 
{code}

Most of those are empty from previous clean restarts. It does still have

* 1 current WAL that it's using for current operations
* 1 previous wal that is in recovery
* 1 previous wal from a failure that hasn't been recognized yet (it restarted again while all masters are down)

And those are easily seen in the non-empty directories

{code}
[busbey@edge ~]$ sudo -u hdfs hdfs dfs -ls /hbase/WALs/rack03server22.hbase.example.com*
Found 1 items
-rw-r--r--   3 hbase hbase         83 2015-04-21 15:43 /hbase/WALs/rack03server22.hbase.example.com,22101,1429652628679-splitting/rack03server22.hbase.example.com%2C22101%2C1429652628679.default.1429656231067
Found 1 items
-rw-r--r--   3 hbase hbase         83 2015-04-22 07:14 /hbase/WALs/rack03server22.hbase.example.com,22101,1429665239905/rack03server22.hbase.example.com%2C22101%2C1429665239905.default.1429712050345
Found 1 items
-rw-r--r--   3 hbase hbase         83 2015-04-23 08:17 /hbase/WALs/rack03server22.hbase.example.com,22101,1429726649267/rack03server22.hbase.example.com%2C22101%2C1429726649267.default.1429802256366
[busbey@edge ~]$
{code}

So maybe we need an additional cleanup action on becomine active master that removes empty previous wal directories.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)