hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allan Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-21751) WAL creation fails during region open may cause region assign forever fail
Date Wed, 23 Jan 2019 07:28:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749607#comment-16749607

Allan Yang commented on HBASE-21751:

But if you do not use multi WAL, this will not cause a very big problem?
We don not use multi WAL. Yes, no region on the RS before can cause this, but in our case,
it's the meta wal, so the RS don't host the meta region before
And we will retry a lot of times when rolling a WAL, so for your production, the first thing
is that why we still fail after so many retries? The actual problem is on HDFS?
Yes, it is HDFS causing this, it is because of disk full this time, but we have seen some
other glitches in HDFS can cause roll log fail. Actually, the disk full problem is soon auto
recovered after hfiles in archive dir deleted. But due to this issue, the meta region can
not online forever.

> WAL creation fails during region open may cause region assign forever fail
> --------------------------------------------------------------------------
>                 Key: HBASE-21751
>                 URL: https://issues.apache.org/jira/browse/HBASE-21751
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.1.2, 2.0.4
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>             Fix For: 2.2.0, 2.1.3, 2.0.5
>         Attachments: HBASE-21751.patch, HBASE-21751v2.patch
> During the first region opens on the RS, WALFactory will create a WAL file, but if the
wal creation fails, in some cases, HDFS will leave a empty file in the dir(e.g. disk full,
file is created succesfully but block allocation fails). We have a check in AbstractFSWAL
that if WAL belong to the same factory exists, then a error will be throw. Thus, the region
can never be open on this RS later.
> {code:java}
> 2019-01-17 02:15:53,320 ERROR [RS_OPEN_META-regionserver/server003:16020-0] handler.OpenRegionHandler(301):
Failed open of region=hbase:meta,,1.1588230740
> java.io.IOException: Target WAL already exists within directory hdfs://cluster/hbase/WALs/server003.hbase.hostname.com,16020,1545269815888
>         at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.<init>(AbstractFSWAL.java:382)
>         at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<init>(AsyncFSWAL.java:210)
>         at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:72)
>         at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:47)
>         at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:138)
>         at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:57)
>         at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:264)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2085)
>         at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
>         at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
>         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>         at java.lang.Thread.run(Thread.java:834)
> {code}

This message was sent by Atlassian JIRA

View raw message