Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0828B200C43 for ; Sun, 12 Mar 2017 01:24:56 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 06B0F160B88; Sun, 12 Mar 2017 00:24:56 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 01209160B7B for ; Sun, 12 Mar 2017 01:24:54 +0100 (CET) Received: (qmail 80996 invoked by uid 500); 12 Mar 2017 00:24:54 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Received: (qmail 80984 invoked by uid 99); 12 Mar 2017 00:24:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Mar 2017 00:24:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 4F13FC05EE for ; Sun, 12 Mar 2017 00:24:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.129 X-Spam-Level: *** X-Spam-Status: No, score=3.129 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id F3koE-DSmxDt for ; Sun, 12 Mar 2017 00:24:51 +0000 (UTC) Received: from mail-ua0-f170.google.com (mail-ua0-f170.google.com [209.85.217.170]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A6DA85F295 for ; Sun, 12 Mar 2017 00:24:50 +0000 (UTC) Received: by mail-ua0-f170.google.com with SMTP id 72so137382167uaf.3 for ; Sat, 11 Mar 2017 16:24:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=G3N2F5tRCkgSFkvzVlO39NgTOyBfbiTKXjd1CP+/JGg=; b=gd5/dFAwFfhWBIxIHPsiGQTME/pcMgZ08GeFs2WRtjA/Enb5uqiqavst7Nvkxyb/CC XoQM7oVtaZfeZr+OqG6ICSgJE19fpQJAHvbBEWUeWmaYEo4h49O/LJB2/y18vz8uaFJZ kRmt6WEpD8ee4iLd9qQgnT+dByfOlram+RCtGwy8YOIfQLUDNGXOLCMNKFVeMX+9NMOI +PDmY7z1ja/K0XzU9mLVldp3r/uDuoCd/kkaitn27j6VtdPWN/K/N9c2T2R295lGL6rA QFuE5YVKCUwCemQLMRKEQZN22XOukVsW9xxtDzehPKHZlcJ30AbE2OXBc7DS/LPqjRik WrHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=G3N2F5tRCkgSFkvzVlO39NgTOyBfbiTKXjd1CP+/JGg=; b=NBZuqyeZWuYs9RSRRfOtTKMG9QIHPQ5QJVUNqbPF6yBskSJHZEd7hKN4tNf3OqL+1Z 00ewSNWccwm4vRxzdvITNXTbXMkvXVzHR8kAv4+0/RAApEjO1t9fZDiIHrs3kxYjbSyU pIY+V4EZKw05KUpfb+sQTyfrcgJrbE/M+5j6UmiMDE0tvSvSG9jWhtVyiQd6I0vJmP1h gWq5PcsneMdyOtEl6c4lbU+a4uU9wLJ7sfBC4a07b+OUgeSaPeLTxE7gYS7DH+i96ZY/ TsWSQJFSIbbh7xJPi9UuxW2h/VsDLbUgo7yA8ht8/oGwTLg4eyuwHB0yFNIaN+V5eJPV hQbw== X-Gm-Message-State: AMke39l7nCdKPZ8DiFLVdd1Fq8cjRet1xK56a7Fp076ohMCLm69FvhUt6+OwTZnxlAwf3/CGHUUWKkh3AFpxxg== X-Received: by 10.176.80.242 with SMTP id d47mr12637652uaa.107.1489277825494; Sat, 11 Mar 2017 16:17:05 -0800 (PST) MIME-Version: 1.0 Received: by 10.103.104.130 with HTTP; Sat, 11 Mar 2017 16:17:04 -0800 (PST) Received: by 10.103.104.130 with HTTP; Sat, 11 Mar 2017 16:17:04 -0800 (PST) In-Reply-To: References: <003901d29a5f$e0121de0$a03659a0$@etcoleman.com> <58C476F0.4090501@gmail.com> From: Josh Elser Date: Sat, 11 Mar 2017 19:17:04 -0500 Message-ID: Subject: Re: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2 To: dev@accumulo.apache.org Content-Type: multipart/alternative; boundary=94eb2c191da0c2b7c0054a7d83f2 archived-at: Sun, 12 Mar 2017 00:24:56 -0000 --94eb2c191da0c2b7c0054a7d83f2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable This is a do-ocracy. Please just change the test if you believe to have a better way to test what it is trying to test. On Mar 11, 2017 18:43, "Christopher" wrote: > On Sat, Mar 11, 2017 at 5:15 PM Josh Elser wrote: > > > Christopher, > > > > When I wrote that test, there were issues with the minimum functioning > > renewal period as provided by the embedded KDC from Kerby. That is why > > this test runs for so long -- anything shorter failed. > > > > > I understand that. There was a comment in the code to that effect. > > > > This test passed at one point. I don't run tests on my own hardware to > > catch regressions anymore after previous discussions with you on this > > matter. > > > I don't understand what you mean by this, or how it applies. I'm sure it > did pass at one point... and may still (hence my question to the group > asking whether they observed it passing). > > > > In the future, I'd suggest investing the time into investigating > > why the test actually failed instead of picking apart the test itself. > > > > > I did preliminary investigation, and forwarded my observations to the gro= up > for further discussion. I even suggested a possible cause for the failure= . > But I didn't think it would be productive to dig any deeper without first > raising what I found to the group for further discussion and feedback. > > "picking apart the test itself" is also known as "reviewing code" and > "investigating". I think you're taking my criticism of the code personall= y, > and I'm not sure why. The fact is, I got as far as I could at 1AM on > Saturday, and informed the group of what I experienced, because I thought > it was relevant to the vote which expires on Monday morning. It seems tha= t > you'd prefer I postpone my comments until I have some kind of "perfect > knowledge" of what went wrong with the test and how to fix it. Aside from > the fact that I knew that I wasn't going to have time before the vote > concluded on Monday, that makes no sense to me even under ideal > circumstances... if we all did that, why would we even have a group? We'r= e > better when we rely on each other's expertise and knowledge, and discuss > problems (or potential problems) as a team. I would like to see this test > improved, but I knew that working on it in silence on my own was not goin= g > to achieve that. > > > > Thanks. > > > > Ed Coleman wrote: > > > I had commented on https://issues.apache.org/jira/browse/ACCUMULO-460= 2 > > that I often have trouble with this and a few others. > > > > > > > > > > > > Not sure it makes me feel any better, but for me, this is not "new" t= o > > 1.7.3. I thought it could be due my virtual-box development environment= , > > but I've tried running verify on a AWS c4.2xlarge instance with the sam= e > > intermittent results. I have had it pass, but more often than not it > fails. > > > > > > > > > > > > To help decide if 1.7.3-rc0 could be a candidate, I made the followin= g > > chart tracking IT issues =E2=80=93 and then at one point the KerberosRe= newall > > passed for me (and it passed a few times in a row) and I stopped updati= ng > > the chart.: > > > > > > > > > > > > > > > > > > Instance Type > > > > > > > > > Test > > > > > > AWS1 > > > > > > AWS2 > > > > > > AW3 > > > > > > OpenBox 1 > > > > > > OpenBox 2 > > > > > > OpenBox3 > > > > > > > > > AssignmentThreadsIT.testConcurrentAssignmentPerformance:91 > > > > > > > > > > > > x > > > > > > x > > > > > > x > > > > > > > > > > > > > > > > > > > > > BadDeleteMarkersCreatedIT>AccumuloClusterIT.teardownCluster:223 =C2= =BB > > TestTimedOut > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > x > > > > > > > > > ChaoticBalancerIT.test:80->Object.wait:502->Object.wait:-2 =C2=BB > > TestTimedOut test... > > > > > > > > > > > > x > > > > > > > > > > > > x > > > > > > > > > > > > > > > > > > > > > ConditionalWriterIT.testTrace:1476 =C2=BB TestTimedOut test timed out= after > > 60 seco... > > > > > > > > > > > > > > > > > > x > > > > > > > > > > > > > > > > > > > > > > > > > > > DurabilityIT.testWriteSpeed:103 log should be faster than flush > > > > > > x > > > > > > x > > > > > > x > > > > > > x > > > > > > > > > > > > > > > > > > > > > FateStarvationIT.run:79 =C2=BB Runtime java.lang.RuntimeException: > > org.apache.zooke... > > > > > > > > > > > > > > > > > > x > > > > > > > > > > > > > > > > > > > > > > > > > > > KerberosRenewalIT.testReadAndWriteThroughTicketLifetime =C2=BB TestTi= medOut > > test ti... > > > > > > x > > > > > > x > > > > > > x > > > > > > x > > > > > > x > > > > > > x > > > > > > > > > ShellServerIT.trace:1444 > > > > > > x > > > > > > > > > > > > x > > > > > > > > > > > > > > > > > > > > > > > > > > > TabletStateChangeIteratorIT.test:100 No tables should need attention > > expected:<0> but was:<1> > > > > > > x > > > > > > > > > > > > > > > > > > > > > > > > x > > > > > > > > > > > > > > > > > UnorderedWorkAssignerReplicationIT.dataWasReplicatedToThePeerWith > outDrain:548 > > =C2=BB TableOffline > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > KerberosReplicationIT.dataReplicatedToCorrectTable:224 =C2=BB TestTim= edOut > > test tim... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > x > > > > > > > > > > > > I am seeing the same intermittent failures with 1.7.3-rc1 and > 1.7.3-rc2. > > > > > > > > > > > > -----Original Message----- > > > From: Christopher [mailto:ctubbsii@apache.org] > > > Sent: Saturday, March 11, 2017 1:53 AM > > > To: Accumulo Dev List > > > Subject: Re: [VOTE] Accumulo 1.7.3-rc2 > > > > > > > > > > > > +1, reluctantly, due to KerberosRenewalIT failures described below. > > > > > > > > > > > > Verified hashes/sigs/javadoc jars/source jars/git SHA1/tarball > > contents/license stuffs/ITs. > > > > > > > > > > > > I could not get KerberosRenewalIT to pass at all (I tried half a doze= n > > times). It keeps timing out. It looks like it's supposed to finish > between > > > > > > 8 and 9 minutes... an insanely long time for a *single* test to be > > running, IMO, especially one as narrowly focused as this one > > (ShellServerIT, for example, runs about that long, but covers a very > broad > > spectrum of Accumulo behavior). This test ignores the scaling parameter= , > > too, so it cannot be scaled with the timeout.factor system property. > > > > > > > > > > > > The actual behavior of the test is to just create a table, put in dat= a, > > scan it, then delete the table, every 5 seconds for 8 minutes minimum, > > under the assumption that the Kerberos ticket will expire at some point > > during that time period, and Accumulo will automatically renew it and > > continue functioning (the actual condition of expiration and renewal is > > never checked). This seems like something that should be mocked out on > the > > object responsible for the detecting and handling the renewal, and not = a > > > > > > 8-9 minute integration test. It's not even clear from the current tes= t > > which code is responsible for that (e.g. which code this test is > testing). > > > > > > The most recent failure timed out after 9 minutes trying to create an > > Accumulo table. This could indicate that there's a problem with the > ticket > > not renewing when there's an expiration waiting for a FATE operation... > or > > it could just be that's where the test happened to be when the 9 minute= s > > were up. > > > > > > > > > > > > Is anybody else experiencing problems with this test? > > > > > > > > > > > > In spite of this failure, I'm willing to give my +1 anyway, since I'm > > inclined to think this is simply an unreliable test. > > > > > > > > > > > > On Fri, Mar 10, 2017 at 5:45 PM Keith Turner< keith@deenlo.com> > > keith@deenlo.com> wrote: > > > > > > > > > > > >> I also verified the rfile fix. > > > > > > > > >> On Fri, Mar 10, 2017 at 5:38 PM, Keith Turner< > keith@deenlo.com> keith@deenlo.com> wrote: > > > > > >>> +1 > > > > > > > > >>> Did the following : > > > > > > > > >>> * Was able to build Fluo against jars in staging repo. > > > > > >>> * Sigs checkout for tarballs > > > > > >>> * No diffs between src tarball and rc2 branch > > > > > >>> * Looked at diffs between rc1 and rc2 > > > > > > > > > > > >>> On Fri, Mar 10, 2017 at 7:35 AM, Ed Coleman< > dev1@etcoleman.com> dev1@etcoleman.com> wrote: > > > > > >>>> Accumulo Developers, > > > > > > > > > > > > > > >>>> Please consider the following candidate for Accumulo 1.7.3. This > > > > > >> candidate > > > > > >>>> contains two changes from 1.7.3-rc1: > > > > > > > > > > > > > > >>>> - > > https://issues.apache.org/jira/browse/ACCUMULO-4600 - > > > > > >> shell does > > > > > >>>> not fall back to accumulo-site.xml when on classpath. > > > > > > > > >>>> - > > https://issues.apache.org/jira/browse/ACCUMULO-4597 - NPE > > > > > >> from > > > > > >>>> RFile PrintInfo > > > > > > > > > > > > > > >>>> Git Commit: > > > > > > > > >>>> 38d8a1d139eb21f0c9882be877db1b77aa1a45db > > > > > > > > >>>> Branch: > > > > > > > > >>>> 1.7.3-rc2 > > > > > > > > > > > > > > >>>> If this vote passes, a gpg-signed tag will be created using: > > > > > > > > >>>> git tag -f -m 'Apache Accumulo 1.7.3' -s rel/1.7.3 > > > > > >>>> 38d8a1d139eb21f0c9882be877db1b77aa1a45db > > > > > > > > > > > > > > >>>> Staging repo: > > > > > > > > >> < > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1> > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1 > > > > > >> 065 > > > > > > > > >>>> Source (official release artifact): > > > > > > > > >> < > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1> > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1 > > > > > >> 065/or > > > > > >>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-src.tar.gz > > > > > > > > >>>> Binary: > > > > > > > > >> < > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1> > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1 > > > > > >> 065/or > > > > > >>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-bin.tar.gz > > > > > > > > >>>> (Append ".sha1", ".md5", or ".asc" to download the signature/hash > > > > > >>>> for a given artifact.) > > > > > > > > > > > > > > >>>> All artifacts were built and staged with: > > > > > > > > >>>> mvn release:prepare&& mvn release:perform > > > > > > > > > > > > > > >>>> Signing keys are available at > > > > > >>>> > > https://www.apache.org/dist/accumulo/KEYS > > > > > > > > >>>> (Expected fingerprint: D87F9F417753D0C88598437EFC4368E0864BCC36) > > > > > > > > > > > > > > >>>> Release notes (in progress) can be found at: > > > > > >>>> > > https://accumulo.apache.org/release_notes/1.7.3 > > > > > > > > > > > > > > >>>> Please vote one of: > > > > > > > > >>>> [ ] +1 - I have verified and accept... > > > > > > > > >>>> [ ] +0 - I have reservations, but not strong enough to vote > against... > > > > > > > > >>>> [ ] -1 - Because..., I do not accept... > > > > > > > > >>>> ... these artifacts as the 1.7.3 release of Apache Accumulo. > > > > > > > > > > > > > > >>>> This vote will end on Mon Mar 13 13:00:00 UTC 2017 > > > > > > > > >>>> (Mon Mar 13 09:00:00 EDT 2017 / Mon Mar 13 06:00:00 PDT 2017) > > > > > > > > > > > > > > >>>> Thanks! > > > > > > > > > > > > > > >>>> P.S. Hint: download the whole staging repo with > > > > > > > > >>>> wget -erobots=3Doff -r -l inf -np -nH \ > > > > > > > > > > > > > > >> < > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1> > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1 > > > > > >> 065/ > > > > > > > > >>>> # note the trailing slash is needed > > > > > > > > > > > > > > > --94eb2c191da0c2b7c0054a7d83f2--