Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 64F9A200B6C for ; Sun, 28 Aug 2016 23:03:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 636F0160AB4; Sun, 28 Aug 2016 21:03:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ABADA160AA6 for ; Sun, 28 Aug 2016 23:03:21 +0200 (CEST) Received: (qmail 1671 invoked by uid 500); 28 Aug 2016 21:03:20 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 1660 invoked by uid 99); 28 Aug 2016 21:03:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Aug 2016 21:03:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9339D2C0150 for ; Sun, 28 Aug 2016 21:03:20 +0000 (UTC) Date: Sun, 28 Aug 2016 21:03:20 +0000 (UTC) From: "Josh Elser (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (ACCUMULO-4425) VolumeIT.testDirtyReplaceVolumes fails MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 28 Aug 2016 21:03:22 -0000 [ https://issues.apache.org/jira/browse/ACCUMULO-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15444116#comment-15444116 ] Josh Elser edited comment on ACCUMULO-4425 at 8/28/16 9:03 PM: --------------------------------------------------------------- bq. The label was the easiest first pass I thought of. I could simplify it with a boolean conditional on the outer loop. I'm more concerned about the strategy of handling this in the test itself than the specific implementation. Yup, I understand and agree with you completely. bq. I agree with the concern about runtime issues. That's why I put it up for review as a PR. I'm concerned we're not properly handling this internally in the WalStateManager. But, I'm also wondering if this is something that can only happen in the test. The thing is... in the dirty shutdown case, I'm not actually sure why these states persist. Perhaps it's just because the ephemeral ZK nodes haven't timed out yet? Maybe it's not something to be concerned about in a real system and is only an artifact of the test. At the very least, it's clear from the workaround that they will eventually resolve themselves, and maybe that's sufficient for a running system? This part of our code is hard to reason about... because there aren't a lot of comments explaining how the design is supposed to work. I'm sure [~kturner] will be able to weigh in when he returns. The ephemeral node timing out certainly seems a plausible explanation (did you check that these are ephemeral nodes, though?). I would assume that changing volumes is a rare scenario and thus our test here is stressing things beyond the normal amount. was (Author: elserj): bq. The label was the easiest first pass I thought of. I could simplify it with a boolean conditional on the outer loop. I'm more concerned about the strategy of handling this in the test itself than the specific implementation. Yup, I understand and agree with you completely. bq. I agree with the concern about runtime issues. That's why I put it up for review as a PR. I'm concerned we're not properly handling this internally in the WalStateManager. But, I'm also wondering if this is something that can only happen in the test. The thing is... in the dirty shutdown case, I'm not actually sure why these states persist. Perhaps it's just because the ephemeral ZK nodes haven't timed out yet? Maybe it's not something to be concerned about in a real system and is only an artifact of the test. At the very least, it's clear from the workaround that they will eventually resolve themselves, and maybe that's sufficient for a running system? This part of our code is hard to reason about... because there aren't a lot of comments explaining how the design is supposed to work. I'm sure [~kturner] > VolumeIT.testDirtyReplaceVolumes fails > -------------------------------------- > > Key: ACCUMULO-4425 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4425 > Project: Accumulo > Issue Type: Bug > Reporter: Christopher Tubbs > Assignee: Christopher Tubbs > Priority: Blocker > Fix For: 1.8.0 > > Time Spent: 10m > Remaining Estimate: 0h > > *Error Message* > {code} > Unexpected volume file:/var/lib/jenkins/workspace/Accumulo-1.8-ITs-failures/test/target/mini-tests/org.apache.accumulo.test.VolumeIT_testDirtyReplaceVolumes/volumes/v1/wal/jenkins.revelc.net+38766/3eb39803-c014-4195-943a-7a12efa2f515 > {code} > *Stacktrace* > {code} > java.lang.AssertionError: Unexpected volume file:/var/lib/jenkins/workspace/Accumulo-1.8-ITs-failures/test/target/mini-tests/org.apache.accumulo.test.VolumeIT_testDirtyReplaceVolumes/volumes/v1/wal/jenkins.revelc.net+38766/3eb39803-c014-4195-943a-7a12efa2f515 > at org.apache.accumulo.test.VolumeIT.verifyVolumesUsed(VolumeIT.java:441) > at org.apache.accumulo.test.VolumeIT.testReplaceVolume(VolumeIT.java:533) > at org.apache.accumulo.test.VolumeIT.testDirtyReplaceVolumes(VolumeIT.java:566) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)