commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sebb <seb...@gmail.com>
Subject Re: RESULT: Failed [VOTE] Release DBCP 1.3/1.4 - take three
Date Sun, 03 Jan 2010 20:53:15 GMT
On 03/01/2010, Phil Steitz <phil.steitz@gmail.com> wrote:
> sebb wrote:
>  > On 03/01/2010, sebb <sebbaz@gmail.com> wrote:
>  >> On 02/01/2010, Phil Steitz <phil.steitz@gmail.com> wrote:
>  >>  > sebb wrote:
>  >>  >  > On 01/01/2010, Phil Steitz <phil.steitz@gmail.com> wrote:
>  >>  >  >> Phil Steitz wrote:
>  >>  >  >>  > sebb wrote:
>  >>  >  >>  >> On 31/12/2009, Phil Steitz <phil.steitz@gmail.com>
wrote:
>  >>  >  >>  >>> Comments have not changed sebb's -1, so I am
going to consider this
>  >>  >  >>  >>>  a failed VOTE and roll another RC with documentation
fixes already
>  >>  >  >>  >>>  made included and attempt at clearer release
notes and README.
>  >>  >  >>  >>>
>  >>  >  >>  >>>  Thanks, all for review and sorry to take so
long to get this right.
>  >>  >  >>  >> Please note that I am still seeing the occasional
test failures (even
>  >>  >  >>  >> after the test bug fix).
>  >>  >  >>  >> As a result, I did not revisit the -1 for the compilation
problems -
>  >>  >  >>  >> the test failure seems like a -1 to me as well.
>  >>  >  >>  >
>  >>  >  >>  > In that case, I am honestly inclined to just remove /
disable the
>  >>  >  >>  > tests.  As I said before, they are fragile and frankly
half-baked.
>  >>  >  >>  > Unfortunately, they did uncover a real bug recently,
so I am of two
>  >>  >  >>  > minds on this.
>  >>  >  >>  >
>  >>  >  >>  > What is going on in the most recent failure you reported
(line 376
>  >>  >  >>  > of TestPerUserPoolDataSource) is a ThreadGroup is started
launching
>  >>  >  >>  > 2 * maxActive threads, all of which try to get connections,
hold
>  >>  >  >>  > them for (sic) 1 ms and then release them.  MaxWait is
100 ms and
>  >>  >  >>  > maxActive is 10.   This "should" work as the effective
throughput
>  >>  >  >>  > should be ~10 requests / ms (that assumes perfect efficiency
and no
>  >>  >  >>  > execution time, which is not quite right), so 20 requests
should
>  >>  >  >>  > complete in ~20 ms.
>  >>  >  >>
>  >>  >  >>
>  >>  >  >> Sorry - that should be 2 ms.
>  >>  >  >
>  >>  >  > If maxWait is 100ms, and each thread waits 1ms, surely this should
always work?
>  >>  >  > Even if each wait actually takes 50ms, the first 10 requests will
take
>  >>  >  > approx 50ms, and the remaining 10 requests will then get their
>  >>  >  > connections.
>  >>  >  >
>  >>  >  > In the tests I ran last year (!), at least some of the failed tests
>  >>  >  > showed that 10 of the threads timed out, i.e. none of the original
10
>  >>  >  > threads had finished. It seems a bit unlikely that this is really
an
>  >>  >  > issue with the processing times.
>  >>  >  >
>  >>  >  > I think this needs closer investigation - I'll try and add some
more
>  >>  >  > debug for the failed cases.
>  >>  >
>  >>  >
>  >>  > Thanks.  I just completed 1000 runs each using Apple 1.5, 1.6, Sun
>  >>  >  1.6 and JRockit 1.4 (last two on Ubuntu 9.10) with no failures.
>  >>
>  >>
>  >> Any tests using multiple core systems?
>  >>
>  >>
>  >>  >  You are correct that with maxActive = 10, throughput should be
>  >>  >  nearly 10/ms, so 20 should finish in 2ms.  There are three things
>  >>  >  that can dampen the throughput:
>  >>  >
>  >>  >  1) Elapsed time between when a thread invokes sleep(1) and performs
>  >>  >  its next action (which is to return the connection it is holding)
>  >>  >  2) Elapsed time waiting for a waiting thread to respond to notify
>  >>  >  3) There is a trivial amount of code executed by the threads holding
>  >>  >  the connections and of course the pool itself executes some code.
>  >>  >
>  >>  >  What JDK are you using when you see these failures?
>  >>
>  >>
>  >> java version "1.6.0_17"
>  >>  Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
>  >>  Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)
>  >>
>  >>  This is on Windows XP, dual-processor (Centrino).
>  >>
>  >>  There is another bug in the test - it does not wait for all the
>  >>  threads to finish.
>  >>  However, I don't think this affects the result, as the first test is
>  >>  the one that fails, so there can't be any threads at that point.
>  >>  However it could affect the second test, as the same driver and pool
>  >>  is used. The two tests should probably be separate test cases.
>  >>
>  >
>  > When a test fails for me, 10 threads get timeouts.
>  > All the first 10 threads take longer than 100ms to complete and all
>  > take about the same amount of time (within 5ms or so).
>
>
> There should be 20 threads launched by the test that does not expect
>  timeouts.  So 10 are completing in time and 10 are timing out?

10 complete without any failures, however they all take over 100ms to
complete - e.g. 160ms or 200ms - and so the other 10 threads suffer
timeouts.


> >
>  > This does not seem to be due to cpu starvation, because the timeouts
>  > occur some while before the first 10 threads complete. This suggests
>  > to me that the JVM is not being stalled by garbage collection or
>  > external activities.
>
>
> I doubt it is either CPU starvation or garbage collection, but it
>  could be clock resolution or thread scheduling.

Looks like it might be thread scheduling.

I've added some System.nanoTime() calls around all the method calls in
the run() method, and so far all the failures occur when
Thread.sleep(1) takes much longer than 1ms.

Normally, this only takes 1-30ms, but every so often the sleep lasts for 100+ms.

Not quite sure how to fix this.
Perhaps increase maxWait() for this particular test? It will need to
be at least 350ms, judging by some of the recent test runs.

The debugging also shows clearly that the threads started by the test
case do not finish before the method completes. In fact in one test,
the test method multipleThreads() finished (and returned the value of
success[0]) before the first thread completed. As it happened, the
first thread failed, but of course the failure was not caught because
the success[0] variable had already been read.

I can fix this by using Thread.join(). This will also guarantee that
the success[0] variable is made visible to the test thread, which at
present is not necessarily the case.

>
>  >
>  > I don't know yet which part of the thread is taking the most time.
>  > I'll add more detailed timers tomorrow; hopefully this will give a
>  > better clue as to what is happening.
>  >
>  >>  >  One thing to look at to rule out a [pool] bug is to see if you get
>  >>  >  failures using pool 1.4.
>  >>  >
>  >>
>  >>
>  >> Not sure I follow - the pom uses specifies pool 1.5.4, so why would
>  >>  using pool 1.4 help?
>  >>
>  >>
>  >>  >
>  >>  >  >
>  >>  >  >>   The test waits 100 ms.  Given the fact that
>  >>  >  >>  > perfect efficiency is obviously unrealistic, you can
see that
>  >>  >  >>  > especially with bad clock resolution and poor thread
management
>  >>  >  >>  > performance (Windoz is known for both), this is going
to fail now
>  >>  >  >>  > and then. FWIW, I have not seen a failure on OS X or
Ubuntu (as OS X
>  >>  >  >>  > guest) since sebb's last patch.
>  >>  >  >>  >
>  >>  >  >>  > Barring objections, I am leaning toward removing the
tests.
>  >>  >  >>  >
>  >>  >  >>  > Phil
>  >>  >  >>  >> I hope to try and look at the failures again tomorrow.
>  >>  >  >>  >>
>  >>  >  >>  >> It would be helpful if others could try running the
failing test as
>  >>  >  >>  >> well (you'll need a script to do this as it only
fails about 1% of the
>  >>  >  >>  >> time or less)
>  >>  >  >>  >>
>  >>  >  >>  >>>  Phil
>  >>  >  >>  >>>
>  >>  >  >>  >>>  Phil Steitz wrote:
>  >>  >  >>  >>>  > Hopefully all problems with JDK versions
and the site build have now
>  >>  >  >>  >>>  > been resolved.  As previously discussed,
the only difference between
>  >>  >  >>  >>>  > 1.3 and 1.4 is that the 1.3 sources have
been filtered to exclude
>  >>  >  >>  >>>  > JDBC4 methods.  Version 1.3 is for JDK
1.4-1.5 and only builds under
>  >>  >  >>  >>>  > one of these JDKs.  Note that to execute
the 1.3 maven build under
>  >>  >  >>  >>>  > JDK 1.4 you need a 2.0.x version of maven.
>  >>  >  >>  >>>  >
>  >>  >  >>  >>>  > Here are the artifacts:
>  >>  >  >>  >>>  >
>  >>  >  >>  >>>  > 1.3 (JDBC 3) version:
>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/dbcp-1.3-rc6
>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/dbcp-1.3-rc6/site
>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/dbcp-1.3-rc6/maven
>  >>  >  >>  >>>  > http://svn.apache.org/repos/asf/commons/proper/dbcp/tags/DBCP_1_3_RC6/
>  >>  >  >>  >>>  >
>  >>  >  >>  >>>  > 1.4 (JDBC 4) version:
>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/dbcp-1.4-rc6
>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/dbcp-1.4-rc6/site
>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/dbcp-1.4-rc6/maven
>  >>  >  >>  >>>  > http://svn.apache.org/repos/asf/commons/proper/dbcp/tags/DBCP_1_4_RC6/
>  >>  >  >>  >>>  >
>  >>  >  >>  >>>  > Release notes (common version, ships with
both)
>  >>  >  >>  >>>  > http://people.apache.org/~psteitz/RELEASE-NOTES.txt
>  >>  >  >>  >>>  >
>  >>  >  >>  >>>  > Votes, please. This VOTE will close 01-January-2010
03:30 GMT.
>  >>  >  >>  >>>  >
>  >>  >  >>  >>>  > [ ] +1 Proceed with release
>  >>  >  >>  >>>  > [ ] +0 OK
>  >>  >  >>  >>>  > [ ] -0 OK, but I would prefer...
>  >>  >  >>  >>>  > [ ] -1 No, showstopper = ...
>  >>  >  >>  >>>  >
>  >>  >  >>  >>>  > Thanks!
>  >>  >  >>  >>>  >
>  >>  >  >>  >>>  > Phil
>  >>  >  >>  >>>
>  >>  >  >>  >>>
>  >>  >  >>  >>>  ---------------------------------------------------------------------
>  >>  >  >>  >>>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  >>  >  >>  >>>  For additional commands, e-mail: dev-help@commons.apache.org
>  >>  >  >>  >>>
>  >>  >  >>  >>>
>  >>  >  >>  >> ---------------------------------------------------------------------
>  >>  >  >>  >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  >>  >  >>  >> For additional commands, e-mail: dev-help@commons.apache.org
>  >>  >  >>  >>
>  >>  >  >>  >
>  >>  >  >>
>  >>  >  >>
>  >>  >  >>  ---------------------------------------------------------------------
>  >>  >  >>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  >>  >  >>  For additional commands, e-mail: dev-help@commons.apache.org
>  >>  >  >>
>  >>  >  >>
>  >>  >  >
>  >>  >  > ---------------------------------------------------------------------
>  >>  >  > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  >>  >  > For additional commands, e-mail: dev-help@commons.apache.org
>  >>  >  >
>  >>  >
>  >>  >
>  >>  >  ---------------------------------------------------------------------
>  >>  >  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  >>  >  For additional commands, e-mail: dev-help@commons.apache.org
>  >>  >
>  >>  >
>  >>
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  > For additional commands, e-mail: dev-help@commons.apache.org
>  >
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message