Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4FEF4200B89 for ; Wed, 21 Sep 2016 16:52:24 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4E798160ADB; Wed, 21 Sep 2016 14:52:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 93952160ABC for ; Wed, 21 Sep 2016 16:52:23 +0200 (CEST) Received: (qmail 2857 invoked by uid 500); 21 Sep 2016 14:52:22 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 2847 invoked by uid 99); 21 Sep 2016 14:52:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2016 14:52:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 4DDA81A0214 for ; Wed, 21 Sep 2016 14:52:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.02 X-Spam-Level: X-Spam-Status: No, score=-0.02 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=deenlo-com.20150623.gappssmtp.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id a4gmwmKqPt-2 for ; Wed, 21 Sep 2016 14:52:20 +0000 (UTC) Received: from mail-oi0-f53.google.com (mail-oi0-f53.google.com [209.85.218.53]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 50D765FE36 for ; Wed, 21 Sep 2016 14:52:19 +0000 (UTC) Received: by mail-oi0-f53.google.com with SMTP id r126so62549011oib.0 for ; Wed, 21 Sep 2016 07:52:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=deenlo-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-transfer-encoding; bh=lA7+vyt0rCNSx94M19p5sakH7hwd0+iSOnNmtZHQRfs=; b=GlrDTxqxmvRGzHWAjviLz5HF3PGRxyJ8EmhrKKRVXWOTXWopHCMX0Z8DCmHEAISZmv bajbKBuh/RFZSe7ZIV2FJcrX2NXpBWBcMuLZZd0NTRI/5ZMx/TLS7JyY3DB8Xo14gBrE Khdk55ssKxI8U9jC2nnx0dL02jbCmCzwVWuDv8AZu+V2G1h4riUat1szRe2/HcQidTyI SbyidxJW8Gw1LjLm3FNu+33KDMVKGOK8F+EPcOMv2dYg8xBdCbnp7rIpo0Yhz/yySI6H hwisC/QpanLKpn3u+0KRGa7OwmPBlMUhrERe0yoooBfh8aOIfM0YJIquDQ6Wo2uG8nBw bQOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-transfer-encoding; bh=lA7+vyt0rCNSx94M19p5sakH7hwd0+iSOnNmtZHQRfs=; b=B7y4Ob2ZVauMJHSqffjhJlr73r+MQlf2tcegIDBokUMRJ9SzV8sQWsLIA0WJ3pUxFM Ze8xrMVy+nXeYz7TsGA+DZw3Nf9W2z9/FjgdUZsGyABE9QsTJCQUhS/53xX9k0lqYeaE VVSZHwVmUj3OZyo6q43TjC528YQjhg+gp/tyCo9FbxtdgH+iUrNlRVsdpPMdCT3tZPIn I9ara9ryROXTETdCLHEAAbECIHI7djVj3Y6JCsaH+bMsCMq4JyOZuvBFOfqTj+l1zl6h iz6N9E8B/3xcdiONKFZ2K8n588cMMjxXbcZP0d5anNfnAFW5halLQNXguIfs+kMoz56o nRAg== X-Gm-Message-State: AE9vXwMF/qVt9p76R6P/MZWXxsizNsZuRy9gtONwt7CATXf2EvEBxqzKGwgBXmuA1gGhYAV0wisPyd3mj+d99A== X-Received: by 10.202.245.201 with SMTP id t192mr47947037oih.78.1474469538010; Wed, 21 Sep 2016 07:52:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.104.12 with HTTP; Wed, 21 Sep 2016 07:52:17 -0700 (PDT) In-Reply-To: <24070BEF0A3F684489AA943FD3439EF20B4A4657E5@CARRXM06.drn.mil.au> References: <24070BEF0A3F684489AA943FD3439EF20B4A4657E3@CARRXM06.drn.mil.au> <57D02D01.4090303@gmail.com> <24070BEF0A3F684489AA943FD3439EF20B4A4657E5@CARRXM06.drn.mil.au> From: Keith Turner Date: Wed, 21 Sep 2016 10:52:17 -0400 Message-ID: Subject: Re: Orphaned FATE Locks [SEC=UNOFFICIAL] To: user@accumulo.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable archived-at: Wed, 21 Sep 2016 14:52:24 -0000 On Wed, Sep 7, 2016 at 8:55 PM, Dickson, Matt MR wrote: > UNOFFICIAL > > > In Zookeeper there don't appear to be any locks with the same txid that i= s listed via Accumulo. However under /accumulo/xxxxxxxx/table_locks/+defau= lt/ there are the same number of files as orphaned locks labelled 'lock-000= 000xx', are these the locks I can delete? Every table lock should have an associated fate transaction. The fate print tool has analysis[1] to look for situations when this is not the case. It seems like this analysis is finding locks w/o transactions. Looking at the code I see there is slight possibility of race condition. Its reads the list of locks into memory and then reads the transactions. So if a transaction completed and deleted lock between those steps the tool could report a false positive. If you run the tool multiple times and it reports the same thing then that race condition is not the cause. Yes, you should delete the locks. These orphaned locks can hold up future table operations. It would be nice to find out what caused this. You can grep for the fate txid related to the locks in the master log to gather info about the fate tx that created the lock. Do you know if the tables ids its complaining about as having locks were deleted? Do you know if fate transactions were deleted (manually in ZK or using Accumulo's tool)? [1]: https://github.com/apache/accumulo/blob/rel/1.8.0/fate/src/main/java/o= rg/apache/accumulo/fate/AdminUtil.java#L69 > > I should note that while investigating this there were no other fate tran= sactions being listed by Accumulo for this table, +default, so the system w= as in a stable state. > > > -----Original Message----- > > > From: Josh Elser [mailto:josh.elser@gmail.com] > Sent: Thursday, 8 September 2016 01:07 > To: user@accumulo.apache.org > Subject: Re: Orphaned FATE Locks [SEC=3DUNOFFICIAL] > > Hi Matt, > > What version of Accumulo are you using? Figuring out why those transactio= ns aren't automatically get removed is something else we would want to look= into. > > It sounds like these transactions are just vestigial (not actually runnin= g), so I wouldn't think that they would affect current bulk loads. > > I believe you could just stop the Master and remove the corresponding nod= es in ZooKeeper (as that's where the txns are stored and `fate print` is re= ading from), but I would defer to Keith for confirmation first :) > > Dickson, Matt MR wrote: >> *UNOFFICIAL* >> >> When running 'fate print -t IN_PROGRESS' to list fate transactions >> there are approximately eight orphaned locks listed as: >> txid: xxxxxxxxxxxxxxx locked: [R:+default] I'm looking into these >> because bulk ingests are failing and there are a lot of CopyFailed >> transactions in the fate lock list. Could these orphaned locks block >> further bulk ingests and is there a way to kill them? >> When I run 'fate fail xxxxxxxx' it states there is no fate transaction >> associated with the transaction id. >> Thanks advance, >> Matt