river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patricia Shanahan <p...@acm.org>
Subject Re: test failure repeatability
Date Mon, 01 Apr 2013 17:01:18 GMT
Are you still using River's TaskManager? When I was looking at its 
performance I noticed some things that seemed dubious to me from the 
point of view of concurrency. I may still have some notes that would be 
useful for a concurrency bug hunt.

Patricia

On 4/1/2013 4:51 AM, Bryan Thompson wrote:
> I am uncomfortable with a release that has known concurrency problems.  I
> am also uncomfortable that the custom Levels serialization change by
> Oracle [1] has broken the current release for new JVMs.  I would like to
> see a minor release which fixes that serialization problem and a candidate
> release which gives people a chance to discovery concurrent issues without
> risking a release that is known to be unstable.
>
> I'm happy to review a few of the classes with known concurrency problems
> to see if I can help nail some of these bugs.  Since I do not know the
> river internals, I would only be able to spot concurrency problems that
> exist within a class.  I am not in a good position to comment on
> concurrency problems that might arise through the interactions among
> classes.
>
> Bryan
>
> [1] https://issues.apache.org/jira/browse/RIVER-416
>
> On 4/1/13 6:14 AM, "Peter Firmstone" <jini@zeus.net.au> wrote:
>
>> The attachments will be removed from the list, so I've cc'd you, anyone
>> who's interested, let me know I can forward the attachments.  They can
>> be opened with jvisualvm.
>>
>> The profiling isn't perfect, the test runs for about 8.5 minutes, so
>> hotspot should have kicked in relatively early in both test runs.
>>
>> I guess a significant problem is; the more I remove performance
>> impedances, like unnecessary DNS calls, the faster multithreading and
>> context switching gets.
>>
>> Not only did the old policy providers create contention, but it was
>> slower for single threaded performance (I'll have to run the previous
>> release branch for comparison when I get some time).
>>
>> The URIGrant.implies call is now down to .228 ms per invocation, down
>>from 1.68 ms per invocation this week, which was already quite good (on
>> old UltraSparcII hardware), during stress tests this method is called
>> almost 40,000 times.
>>
>> In comparison the old policy provider which required a DNS call (every
>> time CodeSource.implies is called, functionality now replaced by
>> URIGrant.implies), the old policy provider also cached all Permission's
>> in highly contended PermissionCollection's, which during network calls
>> invoked SocketPermission.implies, possibly for every SocketPermission in
>> the PermissionCollection, DNS is also consulted by
>> SocketPermission.implies, while synchronized, ouch!
>>
>> URIGrant.implies is non blocking, that's right zero contention.
>> SocketPermission.implies DNS calls can be avoided in most cases if
>> PermissionComparator finds an exact match or wild card.
>>
>> This is without the CombinerSecurityManager, which improves security
>> performance by a factor of 10 (SocketPermission is only checked once for
>> each AccessControlContext).
>>
>> PreferredClassProvider no longer creates unecessary DNS calls, and
>> neither does SecureClassLoader,  URLClassLoader or PreferredClassLoader.
>>
>> So the good news is the next release will feel much faster, the bad news
>> is that existing concurrency bugs that previously didn't appear during
>> test runs, but likely to manifest during production are now occurring
>> during testing.  There's more good news, we fixed a number of
>> concurrency bugs since the last release too, I guess I have to draw the
>> line somewhere and cut a release.
>>
>> DNS calls haven't been completely eliminated as some are still
>> necessary, but a multitude of unnecessary DNS calls have been eliminated.
>>
>> Regards,
>>
>> Peter.
>>
>> Peter Firmstone wrote:
>>> They've passed more consistently in the past, they're either
>>> concurrency bugs or network timing related, it would be nice to at
>>> least determine if it's the former or latter.
>>>
>>> Cheers,
>>>
>>> Peter.
>>>
>>> Tom Hobbs wrote:
>>>> Are these all new failures, i.e. were they working before?  Or are they
>>>> "new" failures in that the test categories have only recently been
>>>> reactivated and the failures discovered?
>>>>
>>>> Are they a big enough blocker to stop a release?
>>>>
>>>>
>>>> On Sun, Mar 31, 2013 at 11:31 PM, Peter Firmstone <jini@zeus.net.au>
>>>> wrote:
>>>>
>>>>
>>>>> The following test fails 30 times in a run of 130 tests:
>>>>>
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>> [java]  #       of      tests   passed  =       0
>>>>> [java]  #       of      tests   passed  =       1
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          100
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Buildfile: build.xml
>>>>>
>>>>> qa.run-tests:
>>>>>
>>>>> james-brown:
>>>>>    [delete] Deleting directory /opt/src/River_Fixed/**
>>>>> peterConcurrentPolicy/qa/soul
>>>>>     [mkdir] Created dir: /opt/src/River_Fixed/**
>>>>> peterConcurrentPolicy/qa/soul
>>>>>     [touch] Creating
>>>>> /opt/src/River_Fixed/**peterConcurrentPolicy/qa/soul/*
>>>>> *soul.201303312239034808
>>>>>
>>>>> run-tests:
>>>>>      [java]
>>>>>      [java] ------------------------------**-----------
>>>>>      [java] CONFIGURATION FILE:
>>>>>      [java]
>>>>>      [java]    /opt/src/River_Fixed/**peterConcurrentPolicy/qa/src/**
>>>>> com/sun/jini/test/resources/**qaHarness.prop
>>>>>      [java]
>>>>>      [java] ------------------------------**-----------
>>>>>      [java] SETTING UP THE TEST LIST:
>>>>>      [java]
>>>>>      [java]    Adding test:
>>>>> com/sun/jini/test/spec/**javaspace/conformance/
>>>>> **snapshot/**SnapshotExpirationNotifyTest.**td
>>>>>      [java]
>>>>>      [java] ------------------------------**-----------
>>>>>      [java] GENERAL HARNESS CONFIGURATION INFORMATION:
>>>>>      [java]
>>>>>      [java]    Date started:
>>>>>      [java]       Sun Mar 31 22:39:37 EST 2013
>>>>>      [java]    Installation directory of the JSK:
>>>>>      [java]       com.sun.jini.jsk.home=/opt/**src/River_Fixed/**
>>>>> peterConcurrentPolicy
>>>>>      [java]    Installation directory of the harness:
>>>>>      [java]       com.sun.jini.qa.home=/opt/src/**River_Fixed/**
>>>>> peterConcurrentPolicy/qa
>>>>>      [java]    Categories being tested:
>>>>>      [java]       categories=No Categories
>>>>>      [java] ------------------------------**-----------
>>>>>      [java] ENVIRONMENT PROPERTIES:
>>>>>      [java]
>>>>>      [java]    JVM information:
>>>>>      [java]       Java HotSpot(TM) Server VM, 20.5-b03, 32 bit VM mode
>>>>>      [java]       Sun Microsystems Inc.
>>>>>      [java]    OS information:
>>>>>      [java]       SunOS, 5.10, sparc
>>>>>      [java]
>>>>>      [java] ------------------------------**-----------
>>>>>      [java] STARTING TO RUN THE TESTS
>>>>>      [java]
>>>>>      [java]
>>>>>      [java] Running com/sun/jini/test/spec/**javaspace/conformance/**
>>>>> snapshot/**SnapshotExpirationNotifyTest.**td
>>>>>      [java] Time is Sun Mar 31 22:39:38 EST 2013
>>>>>      [java] Starting test in separate process with command:
>>>>>      [java] /usr/jdk/jdk1.6.0_30/jre/bin/**java
>>>>>
>>>>> -Djava.security.manager=org.**apache.river.api.security.**CombinerSecur
>>>>> ityManager
>>>>>
>>>>> -Djava.security.policy=file:/**opt/src/River_Fixed/**
>>>>> peterConcurrentPolicy/qa/**harness/policy/defaulttest.**policy
>>>>>
>>>>> -Djava.rmi.server.codebase=htt**p://bluto:9082/qa1-javaspace-**dl.jar<h
>>>>> ttp://bluto:9082/qa1-javaspace-dl.jar>-cp
>>>>> /opt/src/River_Fixed/
>>>>>
>>>>> **peterConcurrentPolicy/qa/lib/**jiniharness.jar:/opt/src/**River_Fixed
>>>>> /**
>>>>>
>>>>> peterConcurrentPolicy/qa/lib/**jinitests.jar:/opt/src/River_**
>>>>>
>>>>> Fixed/peterConcurrentPolicy/**lib/jsk-platform.jar:/opt/src/**River_Fix
>>>>> ed/
>>>>>
>>>>> **peterConcurrentPolicy/lib/jsk-**lib.jar:/opt/src/River_Fixed/**
>>>>>
>>>>> peterConcurrentPolicy/lib/**high-scale-lib.jar:/opt/src/**River_Fixed/*
>>>>> *
>>>>>
>>>>> peterConcurrentPolicy/lib/**custard-apple-1.0.2.jar -ea -esa -client
>>>>> -Djava.ext.dirs=/usr/jdk/jdk1.**6.0_30/jre/lib/ext:/usr/jdk/**
>>>>>
>>>>> packages/lib/ext:/opt/src/**River_Fixed/**peterConcurrentPolicy/qa/lib-
>>>>> **
>>>>>
>>>>> ext:/opt/src/River_Fixed/**peterConcurrentPolicy/lib-ext
>>>>> -Dcom.sun.jini.jsk.port=9080 -Dcom.sun.jini.qa.port=9081
>>>>> -Dcom.sun.jini.jsk.home=/opt/**src/River_Fixed/**peterConcurrentPolicy
>>>>>
>>>>> -Dcom.sun.jini.qa.home=/opt/**src/River_Fixed/**peterConcurrentPolicy/q
>>>>> a
>>>>>
>>>>> -Dcom.sun.jini.qa.harness.**harnessJar=/opt/src/River_**
>>>>> Fixed/peterConcurrentPolicy/**qa/lib/jiniharness.jar
>>>>> -Dcom.sun.jini.qa.harness.**testJar=/opt/src/River_Fixed/**
>>>>> peterConcurrentPolicy/qa/lib/**jinitests.jar
>>>>> -Dcom.sun.jini.qa.harness.**runjiniserver=true
>>>>> -Dcom.sun.jini.qa.harness.**runkitserver=true
>>>>> -Djava.security.properties=*
>>>>> *file:/opt/src/River_Fixed/**peterConcurrentPolicy/qa/**
>>>>> harness/trust/dynamic-policy.**properties
>>>>> -Dcom.sun.jini.qa.harness.**testhosts=
>>>>> -Djava.util.logging.config.**file=/home/peter/logging.**properties
>>>>>
>>>>> -Dcom.sun.jini.test.home=/opt/**src/River_Fixed/**peterConcurrentPolicy
>>>>> /qa
>>>>>
>>>>> -Dcom.sun.jini.test.port=9082 -Dcom.sun.jini.qa.harness.**
>>>>> policies=file:/opt/src/River_**Fixed/peterConcurrentPolicy/**
>>>>> qa/src/com/sun/jini/test/**resources/jinitest.policy
>>>>> -Djava.ext.dirs=/usr/jdk/jdk1.**6.0_30/jre/lib/ext:/usr/jdk/**
>>>>>
>>>>> packages/lib/ext:/opt/src/**River_Fixed/**peterConcurrentPolicy/qa/lib-
>>>>> **
>>>>>
>>>>> ext:/opt/src/River_Fixed/**peterConcurrentPolicy/lib-ext
>>>>> com.sun.jini.qa.harness.**MasterTest com/sun/jini/test/spec/**
>>>>> javaspace/conformance/**snapshot/**SnapshotExpirationNotifyTest.**td
>>>>>      [java] com.sun.jini.qa.harness.**TestException: Not all
>>>>> listeners've
>>>>> got expected number of events.
>>>>>      [java]     at com.sun.jini.test.spec.**javaspace.conformance.**
>>>>> snapshot.**SnapshotExpirationNotifyTest.**run(**
>>>>> SnapshotExpirationNotifyTest.**java:370)
>>>>>      [java]     at
>>>>> com.sun.jini.qa.harness.**MasterTest.doTest(MasterTest.*
>>>>> *java:256)
>>>>>      [java]     at
>>>>> com.sun.jini.qa.harness.**MasterTest.main(MasterTest.**
>>>>> java:144)
>>>>>      [java]
>>>>>      [java] TIME: 10:42:54 PM
>>>>>      [java]
>>>>>      [java] Test process was destroyed and returned code 1
>>>>>      [java]
>>>>> com/sun/jini/test/spec/**javaspace/conformance/**snapshot/**
>>>>> SnapshotExpirationNotifyTest.**td
>>>>>      [java] Test Failed: Test Failed:
>>>>> com.sun.jini.qa.harness.**TestException:
>>>>> Not all listeners've got expected number of events.
>>>>>      [java]
>>>>>      [java]
>>>>>      [java] ------------------------------**-----------
>>>>>      [java]
>>>>>      [java] SUMMARY ==============================**===
>>>>>      [java]
>>>>>      [java]
>>>>> com/sun/jini/test/spec/**javaspace/conformance/**snapshot/**
>>>>> SnapshotExpirationNotifyTest.**td
>>>>>      [java] Test Failed: Test Failed:
>>>>> com.sun.jini.qa.harness.**TestException:
>>>>> Not all listeners've got expected number of events.
>>>>>      [java]
>>>>>      [java] ------------------------------**-----------
>>>>>      [java]
>>>>>      [java] # of tests started   = 1
>>>>>      [java] # of tests completed = 1
>>>>>      [java] # of tests passed    = 0
>>>>>      [java] # of tests failed    = 1
>>>>>      [java]
>>>>>      [java] ------------------------------**-----------
>>>>>      [java]
>>>>>      [java]    Date finished:
>>>>>      [java]       Sun Mar 31 22:42:59 EST 2013
>>>>>      [java]    Time elapsed:
>>>>>      [java]       201 seconds
>>>>>      [java]
>>>>>      [java] Java Result: 1
>>>>>
>>>>> collect-result:
>>>>>
>>>>> BUILD FAILED
>>>>> /opt/src/River_Fixed/**peterConcurrentPolicy/build.**xml:2105: The
>>>>> following error occurred while executing this line:
>>>>> /opt/src/River_Fixed/**peterConcurrentPolicy/qa/**build.xml:357:
>>>>> condition satisfied
>>>>>
>>>>> Total time: 3 minutes 30 seconds
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>


Mime
View raw message