Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 94D0C18462 for ; Mon, 13 Jul 2015 18:38:06 +0000 (UTC) Received: (qmail 63699 invoked by uid 500); 13 Jul 2015 18:38:05 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 63600 invoked by uid 500); 13 Jul 2015 18:38:05 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 63400 invoked by uid 99); 13 Jul 2015 18:38:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jul 2015 18:38:05 +0000 Date: Mon, 13 Jul 2015 18:38:05 +0000 (UTC) From: "Andrew Purtell (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (HBASE-13724) ReplicationSource dies under certain conditions reading a sequence file MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-13724. ------------------------------------ Resolution: Not A Problem Resolving as Not A Problem because there hasn't been any progress and we can't recommend running in production with -ea. Never mind HBase code, what else is out there waiting to be tripped. No problem to reopen when and if there's a patch available for review. > ReplicationSource dies under certain conditions reading a sequence file > ----------------------------------------------------------------------- > > Key: HBASE-13724 > URL: https://issues.apache.org/jira/browse/HBASE-13724 > Project: HBase > Issue Type: Bug > Reporter: churro morales > > A little background, > We run our server in -ea mode and have seen quite a few replication sources silently die over the past few months. > Note: the stacktrace I posted below comes from a regionserver running 0.94 but quickly looking at this issue, I believe this will happen in 98 too. > Should we harden replication source to deal with these types of assertion errors by catching throwables, should we be dealing with this at the sequence file reader level? Still looking into the root cause of this issue but when manually shutdown our regionservers the regionserver that recovered its queue replicated that log just fine. So in our case a simple retry would've worked just fine. > {code} > 2015-05-08 11:04:23,348 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unexpected exception in ReplicationSource, currentPath=hdfs://hm6.xxx.flurry.com:9000/hbase/.logs/xxxxx.yy.flurry.com,60020,1426792702998/xxxxx.atl.flurry.com%2C60020%2C1426792702998.1431107922449 > java.lang.AssertionError > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader$WALReaderFSDataInputStream.getPos(SequenceFileLogReader.java:121) > at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1489) > at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1479) > at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1474) > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:55) > at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:178) > at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:734) > at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69) > at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:583) > at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:373) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)