river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patricia Shanahan <p...@acm.org>
Subject Re: ServiceDiscoveryManager test coverage
Date Fri, 27 Aug 2010 16:46:09 GMT
Excellent! Once the servicediscovery regression is fixed that can be added.

Do you run your tests with logging enabled, and if so at what level? I 
have a specific coverage issue involving JoinManager and RetryTask. As 
far as I can tell, we are not testing what happens when a RetryTask has 
to do a Retry, and I believe tasks can get out of order in undesirable 
ways when that happens. If retries are being tests, at the FINEST 
logging level we would see messages from RetryTask containing "retry of".

I would like to know about any tests that produce those messages.

Patricia




On 8/27/2010 3:29 AM, Jonathan Costers wrote:
> I just ran the set of tests that are currently being selected when executing
> the qa.run target, after I added a couple more categories:
>
> # of tests started   = 497
> # of tests completed = 497
> # of tests skipped   = 21
> # of tests passed    = 497
> # of tests failed    = 0
>
> -----------------------------------------
>
>     Date finished:
>        Fri Aug 27 12:21:04 CEST 2010
>     Time elapsed:
>        27258 seconds
>
> BUILD SUCCESSFUL (total time: 454 minutes 20 seconds)
>
> The categories that are run are:
> id,loader,policyprovider,locatordiscovery,activation,config,discoverymanager,joinmanager,url,iiop,jrmp,reliability,thread,renewalmanager,constraint,export,lookupdiscovery
>
> Looks like we almost have 50% coverage now (about 500 tests out of 1000+).
>
> On my system (an Intel Quad Core with 4GB of memory), this took 7-8 hours to
> run.
>
> 2010/8/27 Patricia Shanahan<pats@acm.org>
>
>> That would be ideal. However, an infrequent run of a very large test set
>> can be managed manually, with check lists.
>>
>> Patricia
>>
>>
>>
>> Jonathan Costers wrote:
>>
>>> The QA harness is also supposed to be able to work in distributed mode,
>>> i.e.
>>> having multiple machines work together on one test run (splitting the work
>>> so to speak).
>>> I haven't looked into that feature too much though.
>>>
>>> 2010/8/27 Patricia Shanahan<pats@acm.org>
>>>
>>>   Based on some experiments, I am convinced a full run may take more than
>>>> 24
>>>> hours, so even that may have to be selective. Jonathan Costers reports
>>>> killing a full run after several days. We may need three targets, in
>>>> addition to problem-specific categories:
>>>>
>>>> 1. A quick test that one would do, for example, after checking out and
>>>> building.
>>>>
>>>> 2. A more substantive test that would run in less than 24 hours, to do
>>>> each
>>>> day.
>>>>
>>>> 3. A complete test that might take several machine-days, and that would
>>>> be
>>>> run against a release candidate prior to release.
>>>>
>>>> Note that even if a test sequence takes several machine-days, that does
>>>> not
>>>> necessarily mean days of elapsed time. Maybe some tests can be run in
>>>> parallel under the same OS copy. Even if that is not possible, we may be
>>>> able to gang up several physical or virtual machines, each running a
>>>> subset
>>>> of the tests.
>>>>
>>>> I think virtual machines may work quite well because a lot of the tests
>>>> do
>>>> something then wait around a minute or two to see what happens. They are
>>>> not
>>>> very intensive resource users.
>>>>
>>>> Patricia
>>>>
>>>>
>>>>
>>>> Peter Firmstone wrote:
>>>>
>>>>   Hi JC,
>>>>>
>>>>> Can we have an ant target for running all the tests?
>>>>>
>>>>> And how about a qa.run.hudson target?
>>>>>
>>>>> I usually use run-categories, to isolate what I'm working on, but we
>>>>> definitely need a target that runs everything that should be, even if
it
>>>>> does take overnight.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Peter.
>>>>>
>>>>> Jonathan Costers wrote:
>>>>>
>>>>>   2010/8/24 Patricia Shanahan<pats@acm.org>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   On 8/22/2010 4:57 PM, Peter Firmstone wrote:
>>>>>>> ...
>>>>>>>
>>>>>>>   Thanks Patricia, that's very helpful, I'll figure it out where
I went
>>>>>>>
>>>>>>>
>>>>>>>   wrong this week, it really shows the importance of full test
>>>>>>>> coverage.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   ...
>>>>>>>
>>>>>>> I strongly agree that test coverage is important. Accordingly,
I've
>>>>>>> done
>>>>>>> some analysis of the "ant qa.run" output.
>>>>>>>
>>>>>>> There are 1059 test description (*.td) files that exist, and
are
>>>>>>> loaded
>>>>>>> at
>>>>>>> the start of "ant qa.run", but that do not seem to be run. I've
>>>>>>> extracted
>>>>>>> the top level categories from those files:
>>>>>>>
>>>>>>> constraint
>>>>>>> discoveryproviders_impl
>>>>>>> discoveryservice
>>>>>>> end2end
>>>>>>> eventmailbox
>>>>>>> export_spec
>>>>>>> io
>>>>>>> javaspace
>>>>>>> jeri
>>>>>>> joinmanager
>>>>>>> jrmp
>>>>>>> loader
>>>>>>> locatordiscovery
>>>>>>> lookupdiscovery
>>>>>>> lookupservice
>>>>>>> proxytrust
>>>>>>> reliability
>>>>>>> renewalmanager
>>>>>>> renewalservice
>>>>>>> scalability
>>>>>>> security
>>>>>>> start
>>>>>>> txnmanager
>>>>>>>
>>>>>>> I'm sure some of these tests are obsolete, duplicates of tests
in
>>>>>>> categories that are being run, or otherwise inappropriate, but
there
>>>>>>> does
>>>>>>> seem to be a rich vein of tests we could mine.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   The QA harness loads all .td files under the "spec" and "impl"
>>>>>> directories
>>>>>> when starting and only witholds the ones that are tagged with the
>>>>>> categories
>>>>>> that we specify from the Ant target.
>>>>>> Whenever a test is really obsolete or otherwise not supposed to run,
it
>>>>>> is
>>>>>> marked with a "SkipTestVerifier" in its .td file.
>>>>>> Most of these are genuine and should be run though.
>>>>>> There are more categories than the ones you mention above, for
>>>>>> instance:
>>>>>> "spec", "id", "id_spec", etc.
>>>>>> Also, some tests are tagged with multiple categories and as such
>>>>>> duplicates
>>>>>> can exist when assembling the list of tests to run.
>>>>>>
>>>>>> The reason not all of them are run (by Hudson) now is that we give
a
>>>>>> specific set of test categories that are known (to me) to run smoothly.
>>>>>> There are many others that are not run (by default) because issue(s)
>>>>>> are
>>>>>> present with one or more of the tests in that category.
>>>>>>
>>>>>> I completely agree with the fact that we should not exclude complete
>>>>>> test
>>>>>> categories because of one test failing.
>>>>>> What we probably should do is tag any problematic test (due to
>>>>>> infrastructure or other reasons) with a SkipTestVerifier for the
time
>>>>>> being
>>>>>> so that it is not taken into account by the QA harness for now.
>>>>>> That way, we can add all test categories to the default Ant run.
>>>>>> However, this would take a large amount of time to run (I've tried
it
>>>>>> once,
>>>>>> and killed the process after several days), which brings us to your
>>>>>> next
>>>>>> point:
>>>>>>
>>>>>> Part of the problem may be time to run the tests. I'd like to propose
>>>>>>
>>>>>>
>>>>>>   splitting the tests into two sets:
>>>>>>>
>>>>>>> 1. A small set that one would run in addition to the relevant
tests,
>>>>>>> whenever making a small change. It should *not* be based on skipping
>>>>>>> complete categories, but on doing those tests from each category
that
>>>>>>> are
>>>>>>> most likely to detect regression, especially regression due to
changes
>>>>>>> in
>>>>>>> other areas.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   Completely agree. However, most of the QA tests are not clear
unit or
>>>>>> regression tests. They are more integration/conformance tests that
test
>>>>>> the
>>>>>> requirements of the spec and its implementation.
>>>>>> Identifying the list of "right" tests to run as part of the small
set
>>>>>> you
>>>>>> mention would require going through all 1059 test descriptions and
>>>>>> their
>>>>>> sources.
>>>>>>
>>>>>> 2. A full test set that may take a lot longer. In many projects,
there
>>>>>> is
>>>>>> a
>>>>>>
>>>>>>
>>>>>>   "nightly build" and a test sequence that is run against that build.
>>>>>>> That
>>>>>>> test sequence can take up to 24 hours to run, and should be as
>>>>>>> complete
>>>>>>> as
>>>>>>> possible. Does Apache have infrastructure to support this sort
of
>>>>>>> operation?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   Again, completely agree. I'm sure Apache supports this through
>>>>>> Hudson. We
>>>>>> could request to setup a second build job, doing nightly builds and
>>>>>> running
>>>>>> the whole test suite. Think this is the only way to make running
the
>>>>>> complete QA suite automatically practical.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   Are there any tests that people *know* should not run? I'm thinking
of
>>>>>>> running the lot just to see what happens, but knowing ones that
are
>>>>>>> not
>>>>>>> expected to work would help with result interpretation.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   See above, tests of that type should have already been tagged
to be
>>>>>> skipped
>>>>>> by the good people that donated this test suite.
>>>>>> I've noticed that usually, when a SkipTestVerifier is used in a .td
>>>>>> file,
>>>>>> someone has put some comments in there to explain why it was tagged
as
>>>>>> such.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   Patricia
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>
>


Mime
View raw message