Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8059A104F5 for ; Thu, 1 May 2014 17:03:19 +0000 (UTC) Received: (qmail 99278 invoked by uid 500); 1 May 2014 17:03:17 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 99195 invoked by uid 500); 1 May 2014 17:03:16 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 99136 invoked by uid 99); 1 May 2014 17:03:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 May 2014 17:03:16 +0000 Date: Thu, 1 May 2014 17:03:16 +0000 (UTC) From: "Keith Turner (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ACCUMULO-2768) Agitator not restarting all datanodes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Turner updated ACCUMULO-2768: ----------------------------------- Affects Version/s: 1.5.1 > Agitator not restarting all datanodes > ------------------------------------- > > Key: ACCUMULO-2768 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2768 > Project: Accumulo > Issue Type: Bug > Components: test > Affects Versions: 1.5.1, 1.6.0 > Environment: 1.6.0 RC5, hadoop 2.2.0, ZK 3.4.5 > 20 node EC2 cluster > Reporter: Keith Turner > Fix For: 1.5.2, 1.6.1 > > > I ran a 24 hours CI test against 1.6.0 RC5 w/ agitation. > I modified the agitation settings to the following : > {noformat} > #time amount of time (in minutes) the agitator should sleep before killing > KILL_SLEEP_TIME=3 > #time amount of time (in minutes) the agitator should sleep after killing before running tup > TUP_SLEEP_TIME=1 > #the minimum and maximum server the agitator will kill at once > MIN_KILL=1 > MAX_KILL=2 > {noformat} > I started 3 walkers all of which died. The walkers saw {{org.apache.accumulo.core.client.impl.AccumuloServerException}}. On the tserver the cause was {{org.apache.hadoop.hdfs.BlockMissingException}}. > After stopping agitation scripts, I ran {{start-dfs.sh}} and saw it started 5 datanodes. Looking at {{datanode-agitator.pl}} I think the problem is when it kills two datanodes, it only restarts one. > All of my ingest clients survived and were able to write 8 billion entries in this wacky environment. I noticed on the monitor that there were long periods of no ingest, but it was not a complete flat line. -- This message was sent by Atlassian JIRA (v6.2#6252)