Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A6114200BB7 for ; Wed, 9 Nov 2016 20:47:58 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A4BC5160AFA; Wed, 9 Nov 2016 19:47:58 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E949B160AEB for ; Wed, 9 Nov 2016 20:47:57 +0100 (CET) Received: (qmail 35992 invoked by uid 500); 9 Nov 2016 19:47:56 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 35981 invoked by uid 99); 9 Nov 2016 19:47:56 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Nov 2016 19:47:56 +0000 Received: from mail-vk0-f50.google.com (mail-vk0-f50.google.com [209.85.213.50]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 9A5C21A0185 for ; Wed, 9 Nov 2016 19:47:56 +0000 (UTC) Received: by mail-vk0-f50.google.com with SMTP id w194so184944245vkw.2 for ; Wed, 09 Nov 2016 11:47:56 -0800 (PST) X-Gm-Message-State: ABUngvcXYGLa+wEPNdYAImxeF+r10y+//Gjl2gWr7v6dsu5dL+YJlXiTFE/IemDQS1HR0DV4cCb7mUXggA6NHA== X-Received: by 10.31.226.66 with SMTP id z63mr798430vkg.103.1478720875746; Wed, 09 Nov 2016 11:47:55 -0800 (PST) MIME-Version: 1.0 Received: by 10.103.17.130 with HTTP; Wed, 9 Nov 2016 11:47:35 -0800 (PST) In-Reply-To: References: From: =?UTF-8?Q?Enis_S=C3=B6ztutar?= Date: Wed, 9 Nov 2016 11:47:35 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Replication Issue, Attempting to flush snapshot with id = -1 To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a114e021a855f960540e38808 archived-at: Wed, 09 Nov 2016 19:47:58 -0000 --001a114e021a855f960540e38808 Content-Type: text/plain; charset=UTF-8 Indeed this looks like HBASE-16270. Enis On Wed, Nov 9, 2016 at 11:31 AM, Ted Yu wrote: > Can you take a look at HBASE-16270 ? > > I did a brief search for 'UnexpectedStateException: Current snapshot id' > which ended up with the above JIRA. > > See if it applies to your case. > > Cheers > > On Wed, Nov 9, 2016 at 10:42 AM, Timothy Brown > wrote: > > > Regarding the config I was referring to "*hbase.replication* (Default: > > false) - Controls whether replication is enabled or disabled for the > > cluster." (from https://hbase.apache.org/0.94/replication.html) > > > > Unfortunately the issue happened over night and the exception gets thrown > > multiple times per second. Here's more of the logs for reference though > > http://pastebin.com/7KxZTrmf > > > > On Wed, Nov 9, 2016 at 10:31 AM, Ted Yu wrote: > > > > > bq. hbase.replication > > > > > > Not sure which config you were referring to above. > > > > > > Can you pastebin more of the region server log around the time > exception > > > happened ? > > > > > > Thanks > > > > > > On Wed, Nov 9, 2016 at 10:24 AM, Timothy Brown > > > wrote: > > > > > > > Hi, > > > > > > > > I'm currently trying to enable High Availability for my HBase > cluster. > > > > I'm using HBase version 1.2.0 provided by Cloudera's cdh5.8.0. > > > > Everything works for a couple hours and then replication stops due to > > > > the exception pasted below. We see sizeOfLogQueue continue to grow > > > > every few minutes. Has anyone else run into this or know how we may > > > > have gotten into this state? > > > > > > > > > > > > Non Default Configs set: > > > > > > > > hbase.region.replica.replication.enabled > > > > > > > > hbase.replication > > > > > > > > > > > > Exception seen: > > > > > > > > Wed Nov 09 00:43:27 UTC 2016, > > > > RpcRetryingCaller{globalStartTime=1478652206658, pause=100, > > > > retries=35}, org.apache.hadoop.hbase.regionserver. > > > > UnexpectedStateException: > > > > org.apache.hadoop.hbase.regionserver.UnexpectedStateException: > Current > > > > snapshot id is -1,passed 1478639480535 > > > > at org.apache.hadoop.hbase.regionserver.DefaultMemStore. > > > > clearSnapshot(DefaultMemStore.java:191) > > > > at org.apache.hadoop.hbase.regionserver.HStore. > > > > updateStorefiles(HStore.java:1082) > > > > at org.apache.hadoop.hbase.regionserver.HStore.access$ > > > > 600(HStore.java:119) > > > > at org.apache.hadoop.hbase.regionserver.HStore$ > > > > StoreFlusherImpl.replayFlush(HStore.java:2377) > > > > at org.apache.hadoop.hbase.regionserver.HRegion. > > > > replayFlushInStores(HRegion.java:4565) > > > > at org.apache.hadoop.hbase.regionserver.HRegion. > > > > replayWALFlushCommitMarker(HRegion.java:4471) > > > > at org.apache.hadoop.hbase.regionserver.HRegion. > > > > replayWALFlushMarker(HRegion.java:4272) > > > > at org.apache.hadoop.hbase.regionserver.RSRpcServices. > > > > doReplayBatchOp(RSRpcServices.java:835) > > > > at org.apache.hadoop.hbase.regionserver.RSRpcServices. > > > > replay(RSRpcServices.java:1765) > > > > at org.apache.hadoop.hbase.protobuf.generated. > > > > AdminProtos$AdminService$2.callBlockingMethod( > AdminProtos.java:22255) > > > > at org.apache.hadoop.hbase.ipc. > RpcServer.call(RpcServer.java: > > > 2170) > > > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner. > > > java:109) > > > > at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop( > > > > RpcExecutor.java:133) > > > > at org.apache.hadoop.hbase.ipc. > RpcExecutor$1.run(RpcExecutor. > > > > java:108) > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > > > > > > > Thanks, > > > > > > > > Tim > > > > > > > > > > --001a114e021a855f960540e38808--