hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj kr <pankaj...@huawei.com>
Subject RE: Region server getting aborted in every one or two days
Date Wed, 23 Mar 2016 12:44:01 GMT
Thanks Anoop for replying..

No explicit close op happened on the WAL file (this log was rolled few sec before). As per
HDFS log, there is no close call to this WAL file.


Same issue happened again on 19th March,

Here WAL was rolled just before the issue happened,
2016-03-19 05:38:07,153 | INFO  | regionserver/RS-HOSTNAME/RS-IP:21302.logRoller | Rolled
WAL /hbase/WALs/RS-HOSTNAME,21302,1458301420876/RS-HOSTNAME%2C21302%2C1458301420876.default.1458337083824
with entries=6508, filesize=61.03 MB; new WAL /hbase/WALs/RS-HOSTNAME,21302,1458301420876/RS-HOSTNAME%2C21302%2C1458301420876.default.1458337087136
| org.apache.hadoop.hbase.regionserver.wal.FSHLog.replaceWriter(FSHLog.java:972)

And after some sec during sync op,
2016-03-19 05:38:10,075 | ERROR | sync.1 | Error syncing, request close of wal  | org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1346)
java.nio.channels.ClosedChannelException
	at org.apache.hadoop.hdfs.DataStreamer$LastExceptionInStreamer.throwException4Close(DataStreamer.java:208)
	at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:142)
	at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:545)
	at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:490)
	at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:190)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1342)
	at java.lang.Thread.run(Thread.java:745)
2016-03-19 05:38:10,076 | INFO  | regionserver/RS-HOSTNAME/RS-IP:21302.logRoller | Rolled
WAL /hbase/WALs/RS-HOSTNAME,21302,1458301420876/RS-HOSTNAME%2C21302%2C1458301420876.default.1458337087136
with entries=6383, filesize=61.51 MB; new WAL /hbase/WALs/RS-HOSTNAME,21302,1458301420876/RS-HOSTNAME%2C21302%2C1458301420876.default.1458337090049
| org.apache.hadoop.hbase.regionserver.wal.FSHLog.replaceWriter(FSHLog.java:972)
2016-03-19 05:38:10,087 | FATAL | regionserver/RS-HOSTNAME/RS-IP:21302.logRoller | ABORTING
region server RS-HOSTNAME,21302,1458301420876: IOE in log roller | org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2055)
java.nio.channels.ClosedChannelException
	at org.apache.hadoop.hdfs.DataStreamer$LastExceptionInStreamer.throwException4Close(DataStreamer.java:208)
	at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:142)
	at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:545)
	at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:490)
	at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:190)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1342)
	at java.lang.Thread.run(Thread.java:745)
2016-03-19 05:38:10,088 | FATAL | regionserver/RS-HOSTNAME/RS-IP`:21302.logRoller | RegionServer
abort: loaded coprocessors are: [org.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionObserver,
org.apache.hadoop.hbase.JMXListener, org.apache.hadoop.hbase.index.coprocessor.wal.IndexWALObserver]
| org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2063)

Here also, no error details in DN/NN log.

I am still checking this, will update if any findings.

Regards,
Pankaj

-----Original Message-----
From: Anoop John [mailto:anoop.hbase@gmail.com] 
Sent: Wednesday, March 23, 2016 3:50 PM
To: user@hbase.apache.org
Subject: Re: Region server getting aborted in every one or two days

At the same time, any explicit close op happened on the WAL file?  Any log rolling?  Can u
check the logs to know this?  May be check HDFS logs to know abt the close calls to WAL file?

-Anoop-

On Wed, Mar 23, 2016 at 12:10 PM, Pankaj kr <pankaj.kr@huawei.com> wrote:
> Hi,
>
> In our production environment, RS is getting aborted in every one or two days with following
exception.
>
> 2016-03-16 13:57:07,975 | FATAL | MemStoreFlusher.0 | ABORTING region 
> server xyz-vm8,24502,1458034278600: Replay of WAL required. Forcing 
> server shutdown | 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer
> .java:2055)
> org.apache.hadoop.hbase.DroppedSnapshotException: region: TB_WEBLOGIN_201603,060,1457916997964.06e204d3bc262b72820aa195fec23513.
>                 at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2423)
>                 at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2128)
>                 at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2090)
>                 at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1983)
>                 at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1909)
>                 at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:509)
>                 at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:470)
>                 at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:74)
>                 at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
>                 at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedChannelException
>               at org.apache.hadoop.hdfs.DataStreamer$LastExceptionInStreamer.throwException4Close(DataStreamer.java:208)
>                 at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:142)
>                 at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:635)
>                 at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:490)
>                 at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
>                 at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:190)
>                 at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1342)
>                 ... 1 more
>
> I don't see any error info at HDFS side at that point of time.
> Have anyone faced this issue?
>
> HBase version is 0.98.6.
>
> Regards,
> Pankaj
Mime
View raw message