db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Myrna van Lunteren <m.v.lunte...@gmail.com>
Subject Re: [VOTE] release
Date Fri, 21 Oct 2011 22:32:30 GMT
On Fri, Oct 21, 2011 at 12:41 PM, Mike Matrigali
<mikem_app@sbcglobal.net> wrote:
> Rick Hillegas wrote:
>> -0
>> I am tempted to vote -1 based on DERBY-5430. The 10.8.2 release candidates
>> produce a deadlock in NsTest. That deadlock was not seen in 10.8.1 or
>> earlier releases.
> If we had a reproducible case for DERBY-5430 I would agree, then we could at
> the very worst case binary search for the change in 10.8 that
> caused the issue.   I've tried this but failed and see very inconsistent
> results using nstest.  On exactly same codeline/machine/environment it
> will pop after 1 hour and then not after days.  I have also reviewed all
> the changes in 10.8 since the previous release and can not come up with
> anything that looks likely to cause this kind of problem.
>> However, I do not have any confidence in NsTest as a release barrier. This
>> test suffers from a number of defects which severely cripple its usefulness:
>> 1) No-one seems to understand this test.
>> 2) The test is not being run in its preferred configuration. The "Ns" in
>> NsTest means "Network Server" I think, but as far as I can see the test is
>> only being run embedded.
> I was around when this test was being developed.  Originally I believe we
> were looking for a network specific test to add to embedded stress tests we
> had.  But when we looked at what resulted there was nothing
> network specific about it, and in fact was found to be more stressful
> run in embedded mode.  I agree if we had the resources we should run it
> in both modes (and maybe even alter its various parameters to change
> what it stresses).  For instance I think it currently also only runs
> on encryped databases and thus does not stress other more "normal" paths.
>> 3) The test produces reams of errors. I don't think we know how to strain
>> signal out of this noise. The sheer volume of errors suggests that the test
>> is badly written and that it does not model a sensible workload.
> I go back and forth on this.  As a developer I believe if I wrote this
> test I would not have it act this way.  But one original objective of the
> stress test was to stress unexpected paths not being tested by others.
>> 4) The person who runs this test (Myrna) has lost confidence in its
>> ability to disclose regressions, as evidenced by the downgrading of the
>> urgency of DERBY-5430.
>> I do not think that we should use NsTest as a release barrier again until
>> we address its defects.
> I think release managers should look at the result of this test and make
> their own determination.  If many ASSERTS or other system errors (like
> DERBY-5422) or server crashes start coming from this test then it is giving
> good feedback.  We would not have seen DERBY-5423 without this test, and I
> believe that would have been a severe problem for existing
> user applications.
> So I agree that nstest failing should not necessarily mean a release should
> be blocked.  Unfortuntately it results need to be interpreted and
> a decision made by the community/release manager on if it should be block or
> not.  It has shown up real bugs in the past that all other
> tests have missed so don't want to throw it out.  It is to bad that it's
> signal to noise ratio is so large.
>> Thanks,
>> -Rick

I'm voting +1 to release

I confirm that I did see the deadlock of DERBY-5430 with - so
even after Rick's backing out the fix for DERBY-4377. I thought Rick
had also seen this in a build off the branch after the backing out?
Perhaps I misread the comments in DERBY-5430.

I decided to lower the priority of DERBY-5430 for 2 reasons:
- nstest is not a very consistent test for finding this issue.
  I can only state that I've *not* seen DERBY-5430 in release cycles
before (at least not nor, which doesn't
mean it didn't exist.
 (As an aside, note that I also did not see DERBY-5454 again with (deadlock on select max) which I had expected to see...)
- a number of people have looked through all the changes and stated
none of them appear obvious for causing this issue.

After finishing up the release work, I will take some time to go on a
binary search and see if I can find if there was a check-in which
caused nstest with embedded to see this (/see this more easily,
assuming it existed before.) But this will be a very slow process,
might be a month or more.

Re nstest with Embedded vs. Network Server  - for the past 4 releases
or so I've run nstest both ways - the embedded configuration on
windows and the network server configuration on a linux machine. I've
consistently logged the results on the platform testing page. The
Network Server test didn't show deadlocks, which is clearly stated in


View raw message