accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Intermittent IT failures - was RE: [VOTE] Accumulo 1.7.3-rc2
Date Sun, 12 Mar 2017 00:17:04 GMT
This is a do-ocracy. Please just change the test if you believe to have a
better way to test what it is trying to test.

On Mar 11, 2017 18:43, "Christopher" <ctubbsii@apache.org> wrote:

> On Sat, Mar 11, 2017 at 5:15 PM Josh Elser <josh.elser@gmail.com> wrote:
>
> > Christopher,
> >
> > When I wrote that test, there were issues with the minimum functioning
> > renewal period as provided by the embedded KDC from Kerby. That is why
> > this test runs for so long -- anything shorter failed.
> >
> >
> I understand that. There was a comment in the code to that effect.
>
>
> > This test passed at one point. I don't run tests on my own hardware to
> > catch regressions anymore after previous discussions with you on this
> > matter.
>
>
> I don't understand what you mean by this, or how it applies. I'm sure it
> did pass at one point... and may still (hence my question to the group
> asking whether they observed it passing).
>
>
> > In the future, I'd suggest investing the time into investigating
> > why the test actually failed instead of picking apart the test itself.
> >
> >
> I did preliminary investigation, and forwarded my observations to the group
> for further discussion. I even suggested a possible cause for the failure.
> But I didn't think it would be productive to dig any deeper without first
> raising what I found to the group for further discussion and feedback.
>
> "picking apart the test itself" is also known as "reviewing code" and
> "investigating". I think you're taking my criticism of the code personally,
> and I'm not sure why. The fact is, I got as far as I could at 1AM on
> Saturday, and informed the group of what I experienced, because I thought
> it was relevant to the vote which expires on Monday morning. It seems that
> you'd prefer I postpone my comments until I have some kind of "perfect
> knowledge" of what went wrong with the test and how to fix it. Aside from
> the fact that I knew that I wasn't going to have time before the vote
> concluded on Monday, that makes no sense to me even under ideal
> circumstances... if we all did that, why would we even have a group? We're
> better when we rely on each other's expertise and knowledge, and discuss
> problems (or potential problems) as a team. I would like to see this test
> improved, but I knew that working on it in silence on my own was not going
> to achieve that.
>
>
> > Thanks.
> >
> > Ed Coleman wrote:
> > > I had commented on https://issues.apache.org/jira/browse/ACCUMULO-4602
> > that I often have trouble with this and a few others.
> > >
> > >
> > >
> > > Not sure it makes me feel any better, but for me, this is not "new" to
> > 1.7.3. I thought it could be due my virtual-box development environment,
> > but I've tried running verify on a AWS c4.2xlarge instance with the same
> > intermittent results. I have had it pass, but more often than not it
> fails.
> > >
> > >
> > >
> > > To help decide if 1.7.3-rc0 could be a candidate, I made the following
> > chart tracking IT issues – and then at one point the KerberosRenewall
> > passed for me (and it passed a few times in a row) and I stopped updating
> > the chart.:
> > >
> > >
> > >
> > >
> > >
> > > Instance Type
> > >
> > >
> > > Test
> > >
> > > AWS1
> > >
> > > AWS2
> > >
> > > AW3
> > >
> > > OpenBox 1
> > >
> > > OpenBox 2
> > >
> > > OpenBox3
> > >
> > >
> > > AssignmentThreadsIT.testConcurrentAssignmentPerformance:91
> > >
> > >
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > > BadDeleteMarkersCreatedIT>AccumuloClusterIT.teardownCluster:223 »
> > TestTimedOut
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > x
> > >
> > >
> > > ChaoticBalancerIT.test:80->Object.wait:502->Object.wait:-2 »
> > TestTimedOut test...
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > > ConditionalWriterIT.testTrace:1476 » TestTimedOut test timed out after
> > 60 seco...
> > >
> > >
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > DurabilityIT.testWriteSpeed:103 log should be faster than flush
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > > FateStarvationIT.run:79 » Runtime java.lang.RuntimeException:
> > org.apache.zooke...
> > >
> > >
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > KerberosRenewalIT.testReadAndWriteThroughTicketLifetime » TestTimedOut
> > test ti...
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > > x
> > >
> > >
> > > ShellServerIT.trace:1444
> > >
> > > x
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > TabletStateChangeIteratorIT.test:100 No tables should need attention
> > expected:<0>  but was:<1>
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > >
> > >
> > UnorderedWorkAssignerReplicationIT.dataWasReplicatedToThePeerWith
> outDrain:548
> > » TableOffline
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > KerberosReplicationIT.dataReplicatedToCorrectTable:224 » TestTimedOut
> > test tim...
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > x
> > >
> > >
> > >
> > > I am seeing the same intermittent failures with 1.7.3-rc1 and
> 1.7.3-rc2.
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Christopher [mailto:ctubbsii@apache.org]
> > > Sent: Saturday, March 11, 2017 1:53 AM
> > > To: Accumulo Dev List<dev@accumulo.apache.org>
> > > Subject: Re: [VOTE] Accumulo 1.7.3-rc2
> > >
> > >
> > >
> > > +1, reluctantly, due to KerberosRenewalIT failures described below.
> > >
> > >
> > >
> > > Verified hashes/sigs/javadoc jars/source jars/git SHA1/tarball
> > contents/license stuffs/ITs.
> > >
> > >
> > >
> > > I could not get KerberosRenewalIT to pass at all (I tried half a dozen
> > times). It keeps timing out. It looks like it's supposed to finish
> between
> > >
> > > 8 and 9 minutes... an insanely long time for a *single* test to be
> > running, IMO, especially one as narrowly focused as this one
> > (ShellServerIT, for example, runs about that long, but covers a very
> broad
> > spectrum of Accumulo behavior). This test ignores the scaling parameter,
> > too, so it cannot be scaled with the timeout.factor system property.
> > >
> > >
> > >
> > > The actual behavior of the test is to just create a table, put in data,
> > scan it, then delete the table, every 5 seconds for 8 minutes minimum,
> > under the assumption that the Kerberos ticket will expire at some point
> > during that time period, and Accumulo will automatically renew it and
> > continue functioning (the actual condition of expiration and renewal is
> > never checked). This seems like something that should be mocked out on
> the
> > object responsible for the detecting and handling the renewal, and not a
> > >
> > > 8-9 minute integration test. It's not even clear from the current test
> > which code is responsible for that (e.g. which code this test is
> testing).
> > >
> > > The most recent failure timed out after 9 minutes trying to create an
> > Accumulo table. This could indicate that there's a problem with the
> ticket
> > not renewing when there's an expiration waiting for a FATE operation...
> or
> > it could just be that's where the test happened to be when the 9 minutes
> > were up.
> > >
> > >
> > >
> > > Is anybody else experiencing problems with this test?
> > >
> > >
> > >
> > > In spite of this failure, I'm willing to give my +1 anyway, since I'm
> > inclined to think this is simply an unreliable test.
> > >
> > >
> > >
> > > On Fri, Mar 10, 2017 at 5:45 PM Keith Turner<  <mailto:
> keith@deenlo.com>
> > keith@deenlo.com>  wrote:
> > >
> > >
> > >
> > >> I also verified the rfile fix.
> > >
> > >
> > >> On Fri, Mar 10, 2017 at 5:38 PM, Keith Turner<  <mailto:
> > keith@deenlo.com>  keith@deenlo.com>  wrote:
> > >
> > >>> +1
> > >
> > >
> > >>> Did the following :
> > >
> > >
> > >>>   * Was able to build Fluo against jars in staging repo.
> > >
> > >>>   * Sigs checkout for tarballs
> > >
> > >>>   * No diffs between src tarball and rc2 branch
> > >
> > >>>   * Looked at diffs between rc1 and rc2
> > >
> > >
> > >
> > >>> On Fri, Mar 10, 2017 at 7:35 AM, Ed Coleman<  <mailto:
> > dev1@etcoleman.com>  dev1@etcoleman.com>  wrote:
> > >
> > >>>> Accumulo Developers,
> > >
> > >
> > >
> > >
> > >>>> Please consider the following candidate for Accumulo 1.7.3. This
> > >
> > >> candidate
> > >
> > >>>> contains two changes from 1.7.3-rc1:
> > >
> > >
> > >
> > >
> > >>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4600>
> > https://issues.apache.org/jira/browse/ACCUMULO-4600 -
> > >
> > >> shell does
> > >
> > >>>> not fall back to accumulo-site.xml when on classpath.
> > >
> > >
> > >>>> -<https://issues.apache.org/jira/browse/ACCUMULO-4597>
> > https://issues.apache.org/jira/browse/ACCUMULO-4597  - NPE
> > >
> > >> from
> > >
> > >>>> RFile PrintInfo
> > >
> > >
> > >
> > >
> > >>>> Git Commit:
> > >
> > >
> > >>>>      38d8a1d139eb21f0c9882be877db1b77aa1a45db
> > >
> > >
> > >>>> Branch:
> > >
> > >
> > >>>>      1.7.3-rc2
> > >
> > >
> > >
> > >
> > >>>> If this vote passes, a gpg-signed tag will be created using:
> > >
> > >
> > >>>>      git tag -f -m 'Apache Accumulo 1.7.3' -s rel/1.7.3
> > >
> > >>>> 38d8a1d139eb21f0c9882be877db1b77aa1a45db
> > >
> > >
> > >
> > >
> > >>>> Staging repo:
> > >
> > >
> > >>   <
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> > >
> > >> 065
> > >
> > >
> > >>>> Source (official release artifact):
> > >
> > >
> > >>   <
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> > >
> > >> 065/or
> > >
> > >>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-src.tar.gz
> > >
> > >
> > >>>> Binary:
> > >
> > >
> > >>   <
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> > >
> > >> 065/or
> > >
> > >>>> g/apache/accumulo/accumulo/1.7.3/accumulo-1.7.3-bin.tar.gz
> > >
> > >
> > >>>> (Append ".sha1", ".md5", or ".asc" to download the signature/hash
> > >
> > >>>> for a given artifact.)
> > >
> > >
> > >
> > >
> > >>>> All artifacts were built and staged with:
> > >
> > >
> > >>>>      mvn release:prepare&&  mvn release:perform
> > >
> > >
> > >
> > >
> > >>>> Signing keys are available at
> > >
> > >>>>   <https://www.apache.org/dist/accumulo/KEYS>
> > https://www.apache.org/dist/accumulo/KEYS
> > >
> > >
> > >>>> (Expected fingerprint: D87F9F417753D0C88598437EFC4368E0864BCC36)
> > >
> > >
> > >
> > >
> > >>>> Release notes (in progress) can be found at:
> > >
> > >>>>   <https://accumulo.apache.org/release_notes/1.7.3>
> > https://accumulo.apache.org/release_notes/1.7.3
> > >
> > >
> > >
> > >
> > >>>> Please vote one of:
> > >
> > >
> > >>>> [ ] +1 - I have verified and accept...
> > >
> > >
> > >>>> [ ] +0 - I have reservations, but not strong enough to vote
> against...
> > >
> > >
> > >>>> [ ] -1 - Because..., I do not accept...
> > >
> > >
> > >>>> ... these artifacts as the 1.7.3 release of Apache Accumulo.
> > >
> > >
> > >
> > >
> > >>>> This vote will end on Mon Mar 13 13:00:00 UTC 2017
> > >
> > >
> > >>>> (Mon Mar 13 09:00:00 EDT 2017 / Mon Mar 13 06:00:00 PDT 2017)
> > >
> > >
> > >
> > >
> > >>>> Thanks!
> > >
> > >
> > >
> > >
> > >>>> P.S. Hint: download the whole staging repo with
> > >
> > >
> > >>>>      wget -erobots=off -r -l inf -np -nH \
> > >
> > >
> > >
> > >
> > >>   <
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1>
> > https://repository.apache.org/content/repositories/orgapacheaccumulo-1
> > >
> > >> 065/
> > >
> > >
> > >>>>      # note the trailing slash is needed
> > >
> > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message