Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C9FCF19997 for ; Wed, 23 Mar 2016 06:41:09 +0000 (UTC) Received: (qmail 57923 invoked by uid 500); 23 Mar 2016 06:41:08 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 57857 invoked by uid 500); 23 Mar 2016 06:41:07 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 57840 invoked by uid 99); 23 Mar 2016 06:41:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Mar 2016 06:41:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 3A795C0BC3 for ; Wed, 23 Mar 2016 06:41:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.322 X-Spam-Level: X-Spam-Status: No, score=-0.322 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 2hk3BU0jTtPP for ; Wed, 23 Mar 2016 06:41:03 +0000 (UTC) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [58.251.152.64]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 2CF2A5F244 for ; Wed, 23 Mar 2016 06:41:00 +0000 (UTC) Received: from 172.24.1.49 (EHLO szxeml428-hub.china.huawei.com) ([172.24.1.49]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DHM68366; Wed, 23 Mar 2016 14:40:56 +0800 (CST) Received: from BLREML406-HUB.china.huawei.com (10.20.4.43) by szxeml428-hub.china.huawei.com (10.82.67.183) with Microsoft SMTP Server (TLS) id 14.3.235.1; Wed, 23 Mar 2016 14:40:55 +0800 Received: from BLREML510-MBS.china.huawei.com ([169.254.2.249]) by BLREML406-HUB.china.huawei.com ([10.20.4.43]) with mapi id 14.03.0235.001; Wed, 23 Mar 2016 12:10:50 +0530 From: Pankaj kr To: "user@hbase.apache.org" Subject: Region server getting aborted in every one or two days Thread-Topic: Region server getting aborted in every one or two days Thread-Index: AdGEzsxaEe086UYyQECxdSQdnFJhKA== Date: Wed, 23 Mar 2016 06:40:50 +0000 Message-ID: <74ECFFA8DC3B6847888649793C770FE0A2D67B62@blreml510-mbs.china.huawei.com> Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.18.248.96] Content-Type: multipart/alternative; boundary="_000_74ECFFA8DC3B6847888649793C770FE0A2D67B62blreml510mbschi_" MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020205.56F23A78.0159,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=169.254.2.249, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: c5bc5084e468f21cc27c13c8cfec76b3 --_000_74ECFFA8DC3B6847888649793C770FE0A2D67B62blreml510mbschi_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, In our production environment, RS is getting aborted in every one or two da= ys with following exception. 2016-03-16 13:57:07,975 | FATAL | MemStoreFlusher.0 | ABORTING region serve= r xyz-vm8,24502,1458034278600: Replay of WAL required. Forcing server shutd= own | org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServe= r.java:2055) org.apache.hadoop.hbase.DroppedSnapshotException: region: TB_WEBLOGIN_20160= 3,060,1457916997964.06e204d3bc262b72820aa195fec23513. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlu= shCacheAndCommit(HRegion.java:2423) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlu= shcache(HRegion.java:2128) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlu= shcache(HRegion.java:2090) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(= HRegion.java:1983) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(= HRegion.java:1909) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flu= shRegion(MemStoreFlusher.java:509) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flu= shRegion(MemStoreFlusher.java:470) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.acc= ess$800(MemStoreFlusher.java:74) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$Flu= shHandler.run(MemStoreFlusher.java:259) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException at org.apache.hadoop.hdfs.DataStreamer$LastExceptionInStreame= r.throwException4Close(DataStreamer.java:208) at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOu= tputStream.java:142) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOu= tputStream.java:635) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputS= tream.java:490) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOut= putStream.java:130) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWrit= er.sync(ProtobufLogWriter.java:190) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunn= er.run(FSHLog.java:1342) ... 1 more I don't see any error info at HDFS side at that point of time. Have anyone faced this issue? HBase version is 0.98.6. Regards, Pankaj --_000_74ECFFA8DC3B6847888649793C770FE0A2D67B62blreml510mbschi_--