commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sebb <seb...@gmail.com>
Subject Re: RESULT: Failed [VOTE] Release DBCP 1.3/1.4 - take three
Date Mon, 04 Jan 2010 11:59:50 GMT
On 03/01/2010, Phil Steitz <phil.steitz@gmail.com> wrote:
> sebb wrote:
>  > On 03/01/2010, Phil Steitz <phil.steitz@gmail.com> wrote:
>  >> sebb wrote:
>  >>  > On 03/01/2010, sebb <sebbaz@gmail.com> wrote:
>  >>  >> On 02/01/2010, Phil Steitz <phil.steitz@gmail.com> wrote:
>  >>  >>  > sebb wrote:
>  >>  >>  >  > On 01/01/2010, Phil Steitz <phil.steitz@gmail.com>
wrote:
>  >>  >>  >  >> Phil Steitz wrote:
>  >>  >>  >  >>  > sebb wrote:
>  >>  >>  >  >>  >> On 31/12/2009, Phil Steitz <phil.steitz@gmail.com>
wrote:
>  >>  >>  >  >>  >>> Comments have not changed sebb's -1,
so I am going to consider this
>  >>  >>  >  >>  >>>  a failed VOTE and roll another RC
with documentation fixes already
>  >>  >>  >  >>  >>>  made included and attempt at clearer
release notes and README.
>  >>  >>  >  >>  >>>
>  >>  >>  >  >>  >>>  Thanks, all for review and sorry to
take so long to get this right.
>  >>  >>  >  >>  >> Please note that I am still seeing the
occasional test failures (even
>  >>  >>  >  >>  >> after the test bug fix).
>  >>  >>  >  >>  >> As a result, I did not revisit the -1 for
the compilation problems -
>  >>  >>  >  >>  >> the test failure seems like a -1 to me
as well.
>  >>  >>  >  >>  >
>  >>  >>  >  >>  > In that case, I am honestly inclined to just
remove / disable the
>  >>  >>  >  >>  > tests.  As I said before, they are fragile
and frankly half-baked.
>  >>  >>  >  >>  > Unfortunately, they did uncover a real bug
recently, so I am of two
>  >>  >>  >  >>  > minds on this.
>  >>  >>  >  >>  >
>  >>  >>  >  >>  > What is going on in the most recent failure
you reported (line 376
>  >>  >>  >  >>  > of TestPerUserPoolDataSource) is a ThreadGroup
is started launching
>  >>  >>  >  >>  > 2 * maxActive threads, all of which try to
get connections, hold
>  >>  >>  >  >>  > them for (sic) 1 ms and then release them.
 MaxWait is 100 ms and
>  >>  >>  >  >>  > maxActive is 10.   This "should" work as the
effective throughput
>  >>  >>  >  >>  > should be ~10 requests / ms (that assumes perfect
efficiency and no
>  >>  >>  >  >>  > execution time, which is not quite right),
so 20 requests should
>  >>  >>  >  >>  > complete in ~20 ms.
>  >>  >>  >  >>
>  >>  >>  >  >>
>  >>  >>  >  >> Sorry - that should be 2 ms.
>  >>  >>  >  >
>  >>  >>  >  > If maxWait is 100ms, and each thread waits 1ms, surely
this should always work?
>  >>  >>  >  > Even if each wait actually takes 50ms, the first 10 requests
will take
>  >>  >>  >  > approx 50ms, and the remaining 10 requests will then
get their
>  >>  >>  >  > connections.
>  >>  >>  >  >
>  >>  >>  >  > In the tests I ran last year (!), at least some of the
failed tests
>  >>  >>  >  > showed that 10 of the threads timed out, i.e. none of
the original 10
>  >>  >>  >  > threads had finished. It seems a bit unlikely that this
is really an
>  >>  >>  >  > issue with the processing times.
>  >>  >>  >  >
>  >>  >>  >  > I think this needs closer investigation - I'll try and
add some more
>  >>  >>  >  > debug for the failed cases.
>  >>  >>  >
>  >>  >>  >
>  >>  >>  > Thanks.  I just completed 1000 runs each using Apple 1.5, 1.6,
Sun
>  >>  >>  >  1.6 and JRockit 1.4 (last two on Ubuntu 9.10) with no failures.
>  >>  >>
>  >>  >>
>  >>  >> Any tests using multiple core systems?
>  >>  >>
>  >>  >>
>  >>  >>  >  You are correct that with maxActive = 10, throughput should
be
>  >>  >>  >  nearly 10/ms, so 20 should finish in 2ms.  There are three
things
>  >>  >>  >  that can dampen the throughput:
>  >>  >>  >
>  >>  >>  >  1) Elapsed time between when a thread invokes sleep(1) and
performs
>  >>  >>  >  its next action (which is to return the connection it is holding)
>  >>  >>  >  2) Elapsed time waiting for a waiting thread to respond to
notify
>  >>  >>  >  3) There is a trivial amount of code executed by the threads
holding
>  >>  >>  >  the connections and of course the pool itself executes some
code.
>  >>  >>  >
>  >>  >>  >  What JDK are you using when you see these failures?
>  >>  >>
>  >>  >>
>  >>  >> java version "1.6.0_17"
>  >>  >>  Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
>  >>  >>  Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)
>  >>  >>
>  >>  >>  This is on Windows XP, dual-processor (Centrino).
>  >>  >>
>  >>  >>  There is another bug in the test - it does not wait for all the
>  >>  >>  threads to finish.
>  >>  >>  However, I don't think this affects the result, as the first test
is
>  >>  >>  the one that fails, so there can't be any threads at that point.
>  >>  >>  However it could affect the second test, as the same driver and
pool
>  >>  >>  is used. The two tests should probably be separate test cases.
>  >>  >>
>  >>  >
>  >>  > When a test fails for me, 10 threads get timeouts.
>  >>  > All the first 10 threads take longer than 100ms to complete and all
>  >>  > take about the same amount of time (within 5ms or so).
>  >>
>  >>
>  >> There should be 20 threads launched by the test that does not expect
>  >>  timeouts.  So 10 are completing in time and 10 are timing out?
>  >
>  > 10 complete without any failures, however they all take over 100ms to
>  > complete - e.g. 160ms or 200ms - and so the other 10 threads suffer
>  > timeouts.
>  >
>  >
>  >>  > This does not seem to be due to cpu starvation, because the timeouts
>  >>  > occur some while before the first 10 threads complete. This suggests
>  >>  > to me that the JVM is not being stalled by garbage collection or
>  >>  > external activities.
>  >>
>  >>
>  >> I doubt it is either CPU starvation or garbage collection, but it
>  >>  could be clock resolution or thread scheduling.
>  >
>  > Looks like it might be thread scheduling.
>  >
>  > I've added some System.nanoTime() calls around all the method calls in
>  > the run() method, and so far all the failures occur when
>  > Thread.sleep(1) takes much longer than 1ms.
>
>
> Yes, I read somewhere that this is not guaranteed to complete in <
>  10 ms on any platform and can take longer on Windows.
>
> >
>  > Normally, this only takes 1-30ms, but every so often the sleep lasts for 100+ms.
>  >
>  > Not quite sure how to fix this.
>  > Perhaps increase maxWait() for this particular test? It will need to
>  > be at least 350ms, judging by some of the recent test runs.
>
>
> I thought about doing that, but that sort of defeats the purpose of
>  the test.  This is one reason that I was thinking about disabling
>  it, but as I said before, these tests did point to a real bug
>  before, so I would actually like to rectify if possible.  Maybe just
>  increasing maxWait (for this case only) is a good idea.

This is what I have done.

Does a long maxWait affect the validity of the sleep(1) test?

>
>  >
>  > The debugging also shows clearly that the threads started by the test
>  > case do not finish before the method completes. In fact in one test,
>  > the test method multipleThreads() finished (and returned the value of
>  > success[0]) before the first thread completed. As it happened, the
>  > first thread failed, but of course the failure was not caught because
>  > the success[0] variable had already been read.
>  >
>  > I can fix this by using Thread.join(). This will also guarantee that
>  > the success[0] variable is made visible to the test thread, which at
>  > present is not necessarily the case.
>
>
> That would probably be an improvement.

Turns out it is essential - otherwise how can the method know if a
still active thread is going to fail later? This happened at least
twice in my testing.

>  I am about to jump on a plane, so will be dark until tomorrow (US
>  EST).  Thanks for chasing this down.  Pls go ahead and commit
>  improvements if you can get it working consistently and you are
>  satisfied that there are no real bugs lurking.
>

Together with some other improvements, I think the test is now much
more consistent.
It may still fail if sleep(1) lasts longer than 430ms (I saw 420 in
testing) but this should now be extremely rare, and it should be
obvious when this occurs as it will affect the test elapsed time.

I suppose it might be worth adding a check to the sleep and report if
it is much greater than expected; this could say that a test failure
in this case is not significant.

>  Phil
>
> >
>  >>  >
>  >>  > I don't know yet which part of the thread is taking the most time.
>  >>  > I'll add more detailed timers tomorrow; hopefully this will give a
>  >>  > better clue as to what is happening.
>  >>  >
>  >>  >>  >  One thing to look at to rule out a [pool] bug is to see if
you get
>  >>  >>  >  failures using pool 1.4.
>  >>  >>  >
>  >>  >>
>  >>  >>
>  >>  >> Not sure I follow - the pom uses specifies pool 1.5.4, so why would
>  >>  >>  using pool 1.4 help?
>  >>  >>
>  >>  >>
>  >>  >>  >
>  >>  >>  >  >
>  >>  >>  >  >>   The test waits 100 ms.  Given the fact that
>  >>  >>  >  >>  > perfect efficiency is obviously unrealistic,
you can see that
>  >>  >>  >  >>  > especially with bad clock resolution and poor
thread management
>  >>  >>  >  >>  > performance (Windoz is known for both), this
is going to fail now
>  >>  >>  >  >>  > and then. FWIW, I have not seen a failure on
OS X or Ubuntu (as OS X
>  >>  >>  >  >>  > guest) since sebb's last patch.
>  >>  >>  >  >>  >
>  >>  >>  >  >>  > Barring objections, I am leaning toward removing
the tests.
>  >>  >>  >  >>  >
>  >>  >>  >  >>  > Phil
>  >>  >>  >  >>  >> I hope to try and look at the failures
again tomorrow.
>  >>  >>  >  >>  >>
>  >>  >>  >  >>  >> It would be helpful if others could try
running the failing test as
>  >>  >>  >  >>  >> well (you'll need a script to do this as
it only fails about 1% of the
>  >>  >>  >  >>  >> time or less)
>  >>  >>  >  >>  >>
>  >>  >>  >  >>  >>>  Phil
>  >>  >>  >  >>  >>>
>  >>  >>  >  >>  >>>  Phil Steitz wrote:
>  >>  >>  >  >>  >>>  > Hopefully all problems with JDK
versions and the site build have now
>  >>  >>  >  >>  >>>  > been resolved.  As previously
discussed, the only difference between
>  >>  >>  >  >>  >>>  > 1.3 and 1.4 is that the 1.3 sources
have been filtered to exclude
>  >>  >>  >  >>  >>>  > JDBC4 methods.  Version 1.3 is
for JDK 1.4-1.5 and only builds under
>  >>  >>  >  >>  >>>  > one of these JDKs.  Note that
to execute the 1.3 maven build under
>  >>  >>  >  >>  >>>  > JDK 1.4 you need a 2.0.x version
of maven.
>  >>  >>  >  >>  >>>  >
>  >>  >>  >  >>  >>>  > Here are the artifacts:
>  >>  >>  >  >>  >>>  >
>  >>  >>  >  >>  >>>  > 1.3 (JDBC 3) version:
>  >>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/dbcp-1.3-rc6
>  >>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/dbcp-1.3-rc6/site
>  >>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/dbcp-1.3-rc6/maven
>  >>  >>  >  >>  >>>  > http://svn.apache.org/repos/asf/commons/proper/dbcp/tags/DBCP_1_3_RC6/
>  >>  >>  >  >>  >>>  >
>  >>  >>  >  >>  >>>  > 1.4 (JDBC 4) version:
>  >>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/dbcp-1.4-rc6
>  >>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/dbcp-1.4-rc6/site
>  >>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/dbcp-1.4-rc6/maven
>  >>  >>  >  >>  >>>  > http://svn.apache.org/repos/asf/commons/proper/dbcp/tags/DBCP_1_4_RC6/
>  >>  >>  >  >>  >>>  >
>  >>  >>  >  >>  >>>  > Release notes (common version,
ships with both)
>  >>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/RELEASE-NOTES.txt
>  >>  >>  >  >>  >>>  >
>  >>  >>  >  >>  >>>  > Votes, please. This VOTE will
close 01-January-2010 03:30 GMT.
>  >>  >>  >  >>  >>>  >
>  >>  >>  >  >>  >>>  > [ ] +1 Proceed with release
>  >>  >>  >  >>  >>>  > [ ] +0 OK
>  >>  >>  >  >>  >>>  > [ ] -0 OK, but I would prefer...
>  >>  >>  >  >>  >>>  > [ ] -1 No, showstopper = ...
>  >>  >>  >  >>  >>>  >
>  >>  >>  >  >>  >>>  > Thanks!
>  >>  >>  >  >>  >>>  >
>  >>  >>  >  >>  >>>  > Phil
>  >>  >>  >  >>  >>>
>  >>  >>  >  >>  >>>
>  >>  >>  >  >>  >>>  ---------------------------------------------------------------------
>  >>  >>  >  >>  >>>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  >>  >>  >  >>  >>>  For additional commands, e-mail: dev-help@commons.apache.org
>  >>  >>  >  >>  >>>
>  >>  >>  >  >>  >>>
>  >>  >>  >  >>  >> ---------------------------------------------------------------------
>  >>  >>  >  >>  >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  >>  >>  >  >>  >> For additional commands, e-mail: dev-help@commons.apache.org
>  >>  >>  >  >>  >>
>  >>  >>  >  >>  >
>  >>  >>  >  >>
>  >>  >>  >  >>
>  >>  >>  >  >>  ---------------------------------------------------------------------
>  >>  >>  >  >>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  >>  >>  >  >>  For additional commands, e-mail: dev-help@commons.apache.org
>  >>  >>  >  >>
>  >>  >>  >  >>
>  >>  >>  >  >
>  >>  >>  >  > ---------------------------------------------------------------------
>  >>  >>  >  > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  >>  >>  >  > For additional commands, e-mail: dev-help@commons.apache.org
>  >>  >>  >  >
>  >>  >>  >
>  >>  >>  >
>  >>  >>  >  ---------------------------------------------------------------------
>  >>  >>  >  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  >>  >>  >  For additional commands, e-mail: dev-help@commons.apache.org
>  >>  >>  >
>  >>  >>  >
>  >>  >>
>  >>  >
>  >>  > ---------------------------------------------------------------------
>  >>  > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  >>  > For additional commands, e-mail: dev-help@commons.apache.org
>  >>  >
>  >>
>  >>
>  >>  ---------------------------------------------------------------------
>  >>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  >>  For additional commands, e-mail: dev-help@commons.apache.org
>  >>
>  >>
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  > For additional commands, e-mail: dev-help@commons.apache.org
>  >
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message