Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ECE2217B57 for ; Thu, 23 Apr 2015 16:10:39 +0000 (UTC) Received: (qmail 27709 invoked by uid 500); 23 Apr 2015 16:10:38 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 27562 invoked by uid 500); 23 Apr 2015 16:10:38 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 27335 invoked by uid 99); 23 Apr 2015 16:10:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Apr 2015 16:10:38 +0000 Date: Thu, 23 Apr 2015 16:10:38 +0000 (UTC) From: "Sean Busbey (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-13539) Clean up empty WAL directories MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Sean Busbey created HBASE-13539: ----------------------------------- Summary: Clean up empty WAL directories Key: HBASE-13539 URL: https://issues.apache.org/jira/browse/HBASE-13539 Project: HBase Issue Type: Bug Components: wal Affects Versions: 1.0.0 Reporter: Sean Busbey Priority: Minor On HMaster startup, we look for wal directories that can indicate the need for recovery. IF there are files in the wal directories, we go through the whole recovery process and eventually delete the directory. However, if the directory is empty we skip over it as a non-error condition. I think the intention for hte empty ones is we could just reuse them. Unfortunately, since our wal directories include a server-start timestamp we don't reuse them and instead keep around a bunch of old directories. ex, this server is only running 1 RS. It has been through some issues. {code} [busbey@edge ~]$ sudo -u hdfs hdfs dfs -ls -d /hbase/WALs/rack03server22.hbase.example.com* drwxrwxrwx - hbase hbase 0 2015-04-04 20:16 /hbase/WALs/rack03server22.hbase.example.com,22101,1428202830692 drwxrwxrwx - hbase hbase 0 2015-04-05 02:54 /hbase/WALs/rack03server22.hbase.example.com,22101,1428204146406 drwxr-xr-x - hbase hbase 0 2015-04-06 14:20 /hbase/WALs/rack03server22.hbase.example.com,22101,1428227900589 drwxr-xr-x - hbase hbase 0 2015-04-07 13:17 /hbase/WALs/rack03server22.hbase.example.com,22101,1428355397531 drwxr-xr-x - hbase hbase 0 2015-04-08 10:12 /hbase/WALs/rack03server22.hbase.example.com,22101,1428438216546 drwxr-xr-x - hbase hbase 0 2015-04-08 12:30 /hbase/WALs/rack03server22.hbase.example.com,22101,1428513527999 drwxr-xr-x - hbase hbase 0 2015-04-10 07:40 /hbase/WALs/rack03server22.hbase.example.com,22101,1428521782656 drwxr-xr-x - hbase hbase 0 2015-04-10 08:23 /hbase/WALs/rack03server22.hbase.example.com,22101,1428677010976 drwxr-xr-x - hbase hbase 0 2015-04-10 08:53 /hbase/WALs/rack03server22.hbase.example.com,22101,1428679573094 drwxr-xr-x - hbase hbase 0 2015-04-13 10:26 /hbase/WALs/rack03server22.hbase.example.com,22101,1428681379039 drwxr-xr-x - hbase hbase 0 2015-04-19 15:28 /hbase/WALs/rack03server22.hbase.example.com,22101,1428946164686 drwxr-xr-x - hbase hbase 0 2015-04-19 15:36 /hbase/WALs/rack03server22.hbase.example.com,22101,1429482692579 drwxr-xr-x - hbase hbase 0 2015-04-21 15:43 /hbase/WALs/rack03server22.hbase.example.com,22101,1429652628679-splitting drwxr-xr-x - hbase hbase 0 2015-04-22 07:14 /hbase/WALs/rack03server22.hbase.example.com,22101,1429665239905 drwxr-xr-x - hbase hbase 0 2015-04-22 08:04 /hbase/WALs/rack03server22.hbase.example.com,22101,1429714674479 drwxr-xr-x - hbase hbase 0 2015-04-22 08:37 /hbase/WALs/rack03server22.hbase.example.com,22101,1429715217130 drwxr-xr-x - hbase hbase 0 2015-04-22 10:28 /hbase/WALs/rack03server22.hbase.example.com,22101,1429717221567 drwxr-xr-x - hbase hbase 0 2015-04-22 11:14 /hbase/WALs/rack03server22.hbase.example.com,22101,1429723761988 drwxr-xr-x - hbase hbase 0 2015-04-23 08:17 /hbase/WALs/rack03server22.hbase.example.com,22101,1429726649267 [busbey@edge ~]$ {code} Most of those are empty from previous clean restarts. It does still have * 1 current WAL that it's using for current operations * 1 previous wal that is in recovery * 1 previous wal from a failure that hasn't been recognized yet (it restarted again while all masters are down) And those are easily seen in the non-empty directories {code} [busbey@edge ~]$ sudo -u hdfs hdfs dfs -ls /hbase/WALs/rack03server22.hbase.example.com* Found 1 items -rw-r--r-- 3 hbase hbase 83 2015-04-21 15:43 /hbase/WALs/rack03server22.hbase.example.com,22101,1429652628679-splitting/rack03server22.hbase.example.com%2C22101%2C1429652628679.default.1429656231067 Found 1 items -rw-r--r-- 3 hbase hbase 83 2015-04-22 07:14 /hbase/WALs/rack03server22.hbase.example.com,22101,1429665239905/rack03server22.hbase.example.com%2C22101%2C1429665239905.default.1429712050345 Found 1 items -rw-r--r-- 3 hbase hbase 83 2015-04-23 08:17 /hbase/WALs/rack03server22.hbase.example.com,22101,1429726649267/rack03server22.hbase.example.com%2C22101%2C1429726649267.default.1429802256366 [busbey@edge ~]$ {code} So maybe we need an additional cleanup action on becomine active master that removes empty previous wal directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)