From user-return-7338-archive-asf-public=cust-asf.ponee.io@accumulo.apache.org Wed Aug 22 00:58:24 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 74FD5180630 for ; Wed, 22 Aug 2018 00:58:23 +0200 (CEST) Received: (qmail 97048 invoked by uid 500); 21 Aug 2018 22:58:22 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 97038 invoked by uid 99); 21 Aug 2018 22:58:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Aug 2018 22:58:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 1F56FC0115 for ; Tue, 21 Aug 2018 22:58:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.101 X-Spam-Level: X-Spam-Status: No, score=-0.101 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_HELO_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=etcoleman.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id a4Bt5iF3UWon for ; Tue, 21 Aug 2018 22:58:20 +0000 (UTC) Received: from biz201.inmotionhosting.com (biz201.inmotionhosting.com [205.134.250.193]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 5A76B5F3CE for ; Tue, 21 Aug 2018 22:58:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=etcoleman.com; s=default; h=Content-Transfer-Encoding:Content-Type: MIME-Version:Message-ID:Date:Subject:In-Reply-To:References:To:From:Sender: Reply-To:Cc:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=p8M3Mn2BBKj9P9BVsAwwM/2ZFMjN/wxeJXUAwSOIy1w=; b=o30n0uOBPj8mcI4qfa+8Hwws/M 8D40M/NuPS2J0K/lvZPxHp2DUrecobVCj9FVUDTBd0EdPE/B13qs3pP0sRr6pCsEwX342rrVIhlvy LG3FLIh4miuOyUkk3SiX+BMjehTTwOPPfjqPCjLyylG1oJ1McMU2C8+RhBiRUrwvqMXA=; Received: from [73.200.188.100] (port=61811 helo=etcws01) by biz201.inmotionhosting.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.91) (envelope-from ) id 1fsFb2-007EHQ-Uw for user@accumulo.apache.org; Tue, 21 Aug 2018 15:58:12 -0700 From: "Ed Coleman" To: References: In-Reply-To: Subject: RE: Corrupt WAL Date: Tue, 21 Aug 2018 18:57:52 -0400 Message-ID: <011101d439a2$64a7a2e0$2df6e8a0$@etcoleman.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Content-Language: en-us Thread-Index: AQHJmEpwoTbyfZx+/BcL1Mkcaud/DwJ8Iy8iAoh+6lIBbQiYYAJ731rLAKihQw4CATakXqSDWMyA X-OutGoing-Spam-Status: No, score=-1.0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - biz201.inmotionhosting.com X-AntiAbuse: Original Domain - accumulo.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - etcoleman.com X-Get-Message-Sender-Via: biz201.inmotionhosting.com: authenticated_id: dev1@etcoleman.com X-Authenticated-Sender: biz201.inmotionhosting.com: dev1@etcoleman.com X-Source: X-Source-Args: X-Source-Dir: The has been work done in https://github.com/apache/accumulo/pull/574. = I'm not certain of the state of the code, but the description may = provide you with things that you could look at manually. -----Original Message----- From: tech.shan@gmail.com [mailto:tech.shan@gmail.com]=20 Sent: Tuesday, August 21, 2018 5:45 PM To: user@accumulo.apache.org Subject: Re: Corrupt WAL Was there any success with this workaround strategy? I am also = experiencing this issue. On 2018/06/13 16:30:22, "Adam J. Shook" wrote:=20 > Sorry, I had the error backwards. There is an OPEN for the WAL and=20 > then immediately a COMPACTION_FINISH entry. This would cause the = error. >=20 > On Wed, Jun 13, 2018 at 11:34 AM, Adam J. Shook > wrote: >=20 > > Looking at the log I see that the last two entries are=20 > > COMPACTION_START of one RFile immediately followed by a=20 > > COMPACTION_START of a separate RFile which (I believe) would lead to = > > the error. Would this necessarily be an issue if the compactions = are for separate RFiles? > > > > This is a dev cluster and I don't necessarily care about it, but is=20 > > there a (good) means to do WAL log surgery? I imagine I can just=20 > > chop off bytes until the log is parseable and missing the info about = the compactions. > > > > On Tue, Jun 12, 2018 at 2:32 PM, Keith Turner = wrote: > > > >> On Tue, Jun 12, 2018 at 12:10 PM, Adam J. Shook=20 > >> > >> wrote: > >> > Yes, that is the error. I'll inspect the logs and report back. > >> > >> Ok. The LogReader command has a mechanism to filter which tablet=20 > >> is displayed. If the walog has alot of data in it, may need to=20 > >> use this. > >> > >> Also, be aware that only 5 mutations are shown for a "many = mutations" > >> objects in the walog. The -m options changes this. May want to = see > >> more when deciding if the info in the log is important. > >> > >> > >> > > >> > On Tue, Jun 12, 2018 at 10:14 AM, Keith Turner > >> wrote: > >> >> > >> >> Is the message you are seeing "COMPACTION_FINISH (without=20 > >> >> preceding COMPACTION_START)" ? That messages indicates that the = > >> >> WALs are incomplete, probably as a result of the NN problems. =20 > >> >> Could do the following : > >> >> > >> >> 1) Run the following command to see whats in the log. Need to=20 > >> >> see what is there for the root tablet. > >> >> > >> >> accumulo org.apache.accumulo.tserver.logger.LogReader > >> >> > >> >> 2) Replace the log file with an empty file after seeing if there = > >> >> is anything important in it. > >> >> > >> >> I think the list of WALs for the root tablet is stored in ZK at=20 > >> >> /accumulo//walogs > >> >> > >> >> On Mon, Jun 11, 2018 at 5:26 PM, Adam J. Shook=20 > >> >> > >> >> wrote: > >> >> > Hey all, > >> >> > > >> >> > The root tablet on one of our dev systems isn't loading due to = > >> >> > an illegal state exception -- COMPACTION_FINISH preceding=20 > >> >> > COMPACTION_START. > >> What'd > >> >> > be > >> >> > the best way to mitigate this issue? This was likely caused=20 > >> >> > due to > >> both > >> >> > of > >> >> > our NameNodes failing. > >> >> > > >> >> > Thank you, > >> >> > --Adam > >> > > >> > > >> > > > > >