Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5BA0917E08 for ; Wed, 13 May 2015 02:12:00 +0000 (UTC) Received: (qmail 53188 invoked by uid 500); 13 May 2015 02:12:00 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 53140 invoked by uid 500); 13 May 2015 02:12:00 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 53119 invoked by uid 99); 13 May 2015 02:12:00 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 May 2015 02:12:00 +0000 Date: Wed, 13 May 2015 02:11:59 +0000 (UTC) From: "Josh Elser (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (ACCUMULO-3811) Improve exception during held commits sent back to clients from BatchWriter MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Josh Elser created ACCUMULO-3811: ------------------------------------ Summary: Improve exception during held commits sent back to clients from BatchWriter Key: ACCUMULO-3811 URL: https://issues.apache.org/jira/browse/ACCUMULO-3811 Project: Accumulo Issue Type: Improvement Components: client, tserver Reporter: Josh Elser Fix For: 1.8.0 Running CI on 1.7.0_rc3, I'm noticing that with datanode agitation, I'm frequently seeing the BatchWriter die. It seems to be that when the ingester is trying to flush right after a datanode dies, the system is polling to minor compact, which blocks the flush and ultimately results in throwing a HoldTimeoutException. It might be due to under-replication that there are no other datanode available to serve the necessary block, but it's a good example of how clients have no way to recover from this case. Client should be able to know if the system is blocking writes and be able to wait and then retry their update. Right now they just see an opaque AccumuloSecurityException without reason as to the nature of the failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)