hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-14223) Meta WALs are not cleared if meta region was closed and RS aborts
Date Mon, 30 Nov 2015 19:54:11 GMT

     [ https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Enis Soztutar updated HBASE-14223:
----------------------------------
    Attachment: hbase-14223_v3-master.patch

Attaching master patch. The difference between the master and branch-1 patch is t that master
patch does not have the unit test since meta cannot be moved out of the hmaster in the master
branch. 

The issue affects branch-1 mostly (where meta can be hosted by any regionserver), but we should
still commit the other changes to master branch as well.  

> Meta WALs are not cleared if meta region was closed and RS aborts
> -----------------------------------------------------------------
>
>                 Key: HBASE-14223
>                 URL: https://issues.apache.org/jira/browse/HBASE-14223
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4, 1.0.4
>
>         Attachments: HBASE-14223logs, hbase-14223_v0.patch, hbase-14223_v1-branch-1.patch,
hbase-14223_v2-branch-1.patch, hbase-14223_v3-branch-1.patch, hbase-14223_v3-branch-1.patch,
hbase-14223_v3-master.patch
>
>
> When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. The last WAL
file just sits there in the RS WAL directory. If RS stops gracefully, the WAL file for meta
is deleted. Otherwise if RS aborts, WAL for meta is not cleaned. It is also not split (which
is correct) since master determines that the RS no longer hosts meta at the time of RS abort.

> From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} directories
left uncleaned: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls /apps/hbase/data/WALs
> Found 31 items
> drwxr-xr-x   - hbase hadoop          0 2015-06-05 01:14 /apps/hbase/data/WALs/hregion-58203265
> drwxr-xr-x   - hbase hadoop          0 2015-06-05 07:54 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting
> drwxr-xr-x   - hbase hadoop          0 2015-06-05 09:28 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting
> drwxr-xr-x   - hbase hadoop          0 2015-06-05 10:01 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting
> ...
> {code}
> The directories contain WALs from meta: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> Found 2 items
> -rw-r--r--   3 hbase hadoop     201608 2015-06-05 03:15 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> -rw-r--r--   3 hbase hadoop      44420 2015-06-05 04:36 /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> The RS hosted the meta region for some time: 
> {code}
> 2015-06-05 03:14:28,692 INFO  [PostOpenDeployTasks:1588230740] zookeeper.MetaTableLocator:
Setting hbase:meta region location in ZooKeeper as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285
> ...
> 2015-06-05 03:15:17,302 INFO  [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion:
Closed hbase:meta,,1.1588230740
> {code}
> In between, a WAL is created: 
> {code}
> 2015-06-05 03:15:11,707 INFO  [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller]
wal.FSHLog: Rolled WAL /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
with entries=385, filesize=196.88 KB; new WAL /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> When CM killed the region server later master did not see these WAL files: 
> {code}
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075 INFO  [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0]
master.SplitLogManager: started splitting 2 logs in [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285]
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300 INFO  [main-EventThread]
wal.WALSplitter: Archived processed log hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
to hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,497 INFO  [main-EventThread]
wal.WALSplitter: Archived processed log hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475175329
to hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475175329
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,507 WARN  [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0]
master.SplitLogManager: returning success without actually splitting and deleting all the
log files in path hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,508 INFO  [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0]
master.SplitLogManager: finished splitting (more than or equal to) 129135000 bytes in 2 log
files in [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
in 4433ms
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message