Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A4C4100A2 for ; Thu, 16 Jan 2014 20:17:27 +0000 (UTC) Received: (qmail 70220 invoked by uid 500); 16 Jan 2014 20:17:25 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 70161 invoked by uid 500); 16 Jan 2014 20:17:23 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 70099 invoked by uid 99); 16 Jan 2014 20:17:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jan 2014 20:17:21 +0000 Date: Thu, 16 Jan 2014 20:17:21 +0000 (UTC) From: "ASF subversion and git services (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-2198) Concurrent randomwalk fails with unbalanced servers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873880#comment-13873880 ] ASF subversion and git services commented on ACCUMULO-2198: ----------------------------------------------------------- Commit cd4eac0d7e2820321db9fc9cdfc8dc89f7dd53d2 in branch refs/heads/1.6.0-SNAPSHOT from [~bhavanki] [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=cd4eac0 ] ACCUMULO-2198 Concurrent randomwalk: add teardown, fix server balance check The Concurrent randomwalk test had been using a test node property to remember the last time when servers were unbalanced, but this property was not getting cleaned up between runs. Therefore, if a new Concurrent test was started some time later, it would pick up the old timestamp property from the last run. This commit adds removal of the property during test teardown, and also moves the tracking from a node property to test state. In addition, the test logic would reset the timestamp every time servers were found unbalanced, provided the 15-minute allowance hadn't expired. This commit fixes that issue as well. This could lead to more, correct, reports of unbalanced servers. Lastly, the test in 1.5.x requires three checks for unbalanced servers to fail before failing the test. This commit backports that requirement to 1.4.x. The timestamp reset and three-check fixes were added to 1.5.x in commit 0ee7e5a8. > Concurrent randomwalk fails with unbalanced servers > --------------------------------------------------- > > Key: ACCUMULO-2198 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2198 > Project: Accumulo > Issue Type: Bug > Components: test > Affects Versions: 1.4.4 > Reporter: Bill Havanki > Assignee: Bill Havanki > Labels: randomwalk, test > > Not always, but sometimes I am seeing the Concurrent randomwalk test fail with: > {noformat} > java.lang.Exception: Error running node Concurrent.xml > at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:259) > ... > Caused by: java.lang.Exception: Error running node ct.CheckBalance > at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:259) > at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:251) > ... 8 more > Caused by: java.lang.Exception: servers are unbalanced! > at org.apache.accumulo.server.test.randomwalk.concurrent.CheckBalance.visit(CheckBalance.java:74) > at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:251) > ... 9 more > {noformat} > In one case, the 15-minute allowance for balancing extended to a prior run of Concurrent.xml within the same overall test run. In another case, the time span begins at a point when HDFS failed to contact a datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)