Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jira@apache.org
Date: Thu, 1 May 2014 17:03:16 +0000 (UTC)
From: "Keith Turner (JIRA)" <jira@apache.org>
To: notifications@accumulo.apache.org
Message-ID: <JIRA.12711734.1398963528083.214172.1398963796107@arcas>
In-Reply-To: <JIRA.12711734.1398963528083@arcas>
References: <JIRA.12711734.1398963528083@arcas>
Subject: [jira] [Updated] (ACCUMULO-2768) Agitator not restarting all
 datanodes
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/ACCUMULO-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith Turner updated ACCUMULO-2768:
-----------------------------------

    Affects Version/s: 1.5.1

> Agitator not restarting all datanodes
> -------------------------------------
>
>                 Key: ACCUMULO-2768
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2768
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.5.1, 1.6.0
>         Environment: 1.6.0 RC5, hadoop 2.2.0, ZK 3.4.5
> 20 node EC2 cluster
>            Reporter: Keith Turner
>             Fix For: 1.5.2, 1.6.1
>
>
> I ran a 24 hours CI test against 1.6.0 RC5 w/ agitation.
> I modified the agitation settings to the following :
> {noformat}
> #time amount of time (in minutes) the agitator should sleep before killing
> KILL_SLEEP_TIME=3
> #time amount of time (in minutes) the agitator should sleep after killing before running tup 
> TUP_SLEEP_TIME=1
> #the minimum and maximum server the agitator will kill at once
> MIN_KILL=1
> MAX_KILL=2
> {noformat}
> I started 3 walkers all of which died.  The walkers saw {{org.apache.accumulo.core.client.impl.AccumuloServerException}}. On the tserver the cause was {{org.apache.hadoop.hdfs.BlockMissingException}}.
> After stopping agitation scripts, I ran {{start-dfs.sh}} and saw it started 5 datanodes.  Looking at {{datanode-agitator.pl}} I think the problem is when it kills two datanodes, it only restarts one. 
> All of my ingest clients survived and were able to write 8 billion entries in this wacky environment.  I noticed on the monitor that there were long periods of no ingest, but it was not a complete flat line.


--
This message was sent by Atlassian JIRA
(v6.2#6252)