Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 64E6211613 for ; Fri, 6 Jun 2014 19:08:05 +0000 (UTC) Received: (qmail 77457 invoked by uid 500); 6 Jun 2014 19:08:05 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 77410 invoked by uid 500); 6 Jun 2014 19:08:05 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 77290 invoked by uid 99); 6 Jun 2014 19:08:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jun 2014 19:08:05 +0000 Date: Fri, 6 Jun 2014 19:08:05 +0000 (UTC) From: "Bill Havanki (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (ACCUMULO-2868) Make master configurable in when it kills tablet servers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Bill Havanki created ACCUMULO-2868: -------------------------------------- Summary: Make master configurable in when it kills tablet servers Key: ACCUMULO-2868 URL: https://issues.apache.org/jira/browse/ACCUMULO-2868 Project: Accumulo Issue Type: Improvement Components: master Affects Versions: 1.6.0 Reporter: Bill Havanki On a cluster with a flaky network, the master may be unable to contact a tserver for some moderate amount of time and then direct it to terminate, even though the tserver is still up. (See {{gatherTableInformation()}} and {{StatusThread}}. It does not appear possible to configure the master to be more forgiving in these checks. Relevant constants: * {{DEFAULT_WAIT_FOR_WATCHER}} - interval between server checks * {{MAX_BAD_STATUS_COUNT}} - the maximum number of failed attempts allowed before killing the tserver Making one or both of those configurable, or some other pertinent parameter configurable, would allow cluster admins to cope with mild network maladies. -- This message was sent by Atlassian JIRA (v6.2#6252)