Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1353517C2E for ; Fri, 24 Apr 2015 18:50:39 +0000 (UTC) Received: (qmail 34519 invoked by uid 500); 24 Apr 2015 18:50:39 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 34450 invoked by uid 500); 24 Apr 2015 18:50:38 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 34022 invoked by uid 99); 24 Apr 2015 18:50:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Apr 2015 18:50:38 +0000 Date: Fri, 24 Apr 2015 18:50:38 +0000 (UTC) From: "Josh Elser (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (ACCUMULO-3750) Bad instance.secret causes master to repeatedly fail fast attempting to acquire lock MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Josh Elser created ACCUMULO-3750: ------------------------------------ Summary: Bad instance.secret causes master to repeatedly fail fast attempting to acquire lock Key: ACCUMULO-3750 URL: https://issues.apache.org/jira/browse/ACCUMULO-3750 Project: Accumulo Issue Type: Bug Components: master Affects Versions: 1.6.2, 1.6.1, 1.6.0, 1.5.2, 1.5.1, 1.5.0 Reporter: Josh Elser Assignee: Josh Elser Fix For: 1.7.0, 1.6.3 Accidentally restarted a small cluster with bad configuration (missing instance.secret). The tabletservers bailed out quickly, but the master sat in a tight loop trying to get the lock. {noformat} 2015-04-23 11:48:12,356 [trace.DistributedTrace] INFO : SpanReceiver org.apache.accumulo.tracer.ZooTraceClient was loaded successfully. 2015-04-23 11:48:12,357 [master.Master] INFO : trying to get master lock 2015-04-23 11:48:12,395 [master.Master] WARN : Failed to get master lock org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /accumulo/dc25a857-19d8-4387-bec0-64b4dc17cafb/masters/lock/zlock- 2015-04-23 11:48:13,043 [server.Accumulo] WARN : System swappiness setting is greater than ten (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lo ck agreement. 2015-04-23 11:48:13,410 [master.Master] WARN : Failed to get master lock org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /accumulo/dc25a857-19d8-4387-bec0-64b4dc17cafb/masters/lock/zlock- 2015-04-23 11:48:14,418 [master.Master] WARN : Failed to get master lock org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /accumulo/dc25a857-19d8-4387-bec0-64b4dc17cafb/masters/lock/zlock- 2015-04-23 11:48:15,426 [master.Master] WARN : Failed to get master lock org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /accumulo/dc25a857-19d8-4387-bec0-64b4dc17cafb/masters/lock/zlock- 2015-04-23 11:48:16,433 [master.Master] WARN : Failed to get master lock org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /accumulo/dc25a857-19d8-4387-bec0-64b4dc17cafb/masters/lock/zlock- 2015-04-23 11:48:17,440 [master.Master] WARN : Failed to get master lock org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /accumulo/dc25a857-19d8-4387-bec0-64b4dc17cafb/masters/lock/zlock- 2015-04-23 11:48:18,449 [master.Master] WARN : Failed to get master lock org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /accumulo/dc25a857-19d8-4387-bec0-64b4dc17cafb/masters/lock/zlock- {noformat} Looks like the only case which exits the Master when the lock is failed to be acquired is an illegal state where the master thinks it already has the lock. If we get a NoAuthException, we should not attempt to get the lock again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)