river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patricia Shanahan <p...@acm.org>
Subject Re: ServiceDiscoveryManager test coverage
Date Fri, 27 Aug 2010 01:48:06 GMT
That would be ideal. However, an infrequent run of a very large test set 
can be managed manually, with check lists.

Patricia


Jonathan Costers wrote:
> The QA harness is also supposed to be able to work in distributed mode, i.e.
> having multiple machines work together on one test run (splitting the work
> so to speak).
> I haven't looked into that feature too much though.
> 
> 2010/8/27 Patricia Shanahan <pats@acm.org>
> 
>> Based on some experiments, I am convinced a full run may take more than 24
>> hours, so even that may have to be selective. Jonathan Costers reports
>> killing a full run after several days. We may need three targets, in
>> addition to problem-specific categories:
>>
>> 1. A quick test that one would do, for example, after checking out and
>> building.
>>
>> 2. A more substantive test that would run in less than 24 hours, to do each
>> day.
>>
>> 3. A complete test that might take several machine-days, and that would be
>> run against a release candidate prior to release.
>>
>> Note that even if a test sequence takes several machine-days, that does not
>> necessarily mean days of elapsed time. Maybe some tests can be run in
>> parallel under the same OS copy. Even if that is not possible, we may be
>> able to gang up several physical or virtual machines, each running a subset
>> of the tests.
>>
>> I think virtual machines may work quite well because a lot of the tests do
>> something then wait around a minute or two to see what happens. They are not
>> very intensive resource users.
>>
>> Patricia
>>
>>
>>
>> Peter Firmstone wrote:
>>
>>> Hi JC,
>>>
>>> Can we have an ant target for running all the tests?
>>>
>>> And how about a qa.run.hudson target?
>>>
>>> I usually use run-categories, to isolate what I'm working on, but we
>>> definitely need a target that runs everything that should be, even if it
>>> does take overnight.
>>>
>>> Regards,
>>>
>>> Peter.
>>>
>>> Jonathan Costers wrote:
>>>
>>>> 2010/8/24 Patricia Shanahan <pats@acm.org>
>>>>
>>>>
>>>>
>>>>> On 8/22/2010 4:57 PM, Peter Firmstone wrote:
>>>>> ...
>>>>>
>>>>>  Thanks Patricia, that's very helpful, I'll figure it out where I went
>>>>>
>>>>>
>>>>>> wrong this week, it really shows the importance of full test coverage.
>>>>>>
>>>>>>
>>>>>>
>>>>> ...
>>>>>
>>>>> I strongly agree that test coverage is important. Accordingly, I've done
>>>>> some analysis of the "ant qa.run" output.
>>>>>
>>>>> There are 1059 test description (*.td) files that exist, and are loaded
>>>>> at
>>>>> the start of "ant qa.run", but that do not seem to be run. I've
>>>>> extracted
>>>>> the top level categories from those files:
>>>>>
>>>>> constraint
>>>>> discoveryproviders_impl
>>>>> discoveryservice
>>>>> end2end
>>>>> eventmailbox
>>>>> export_spec
>>>>> io
>>>>> javaspace
>>>>> jeri
>>>>> joinmanager
>>>>> jrmp
>>>>> loader
>>>>> locatordiscovery
>>>>> lookupdiscovery
>>>>> lookupservice
>>>>> proxytrust
>>>>> reliability
>>>>> renewalmanager
>>>>> renewalservice
>>>>> scalability
>>>>> security
>>>>> start
>>>>> txnmanager
>>>>>
>>>>> I'm sure some of these tests are obsolete, duplicates of tests in
>>>>> categories that are being run, or otherwise inappropriate, but there
>>>>> does
>>>>> seem to be a rich vein of tests we could mine.
>>>>>
>>>>>
>>>>>
>>>> The QA harness loads all .td files under the "spec" and "impl"
>>>> directories
>>>> when starting and only witholds the ones that are tagged with the
>>>> categories
>>>> that we specify from the Ant target.
>>>> Whenever a test is really obsolete or otherwise not supposed to run, it
>>>> is
>>>> marked with a "SkipTestVerifier" in its .td file.
>>>> Most of these are genuine and should be run though.
>>>> There are more categories than the ones you mention above, for instance:
>>>> "spec", "id", "id_spec", etc.
>>>> Also, some tests are tagged with multiple categories and as such
>>>> duplicates
>>>> can exist when assembling the list of tests to run.
>>>>
>>>> The reason not all of them are run (by Hudson) now is that we give a
>>>> specific set of test categories that are known (to me) to run smoothly.
>>>> There are many others that are not run (by default) because issue(s) are
>>>> present with one or more of the tests in that category.
>>>>
>>>> I completely agree with the fact that we should not exclude complete test
>>>> categories because of one test failing.
>>>> What we probably should do is tag any problematic test (due to
>>>> infrastructure or other reasons) with a SkipTestVerifier for the time
>>>> being
>>>> so that it is not taken into account by the QA harness for now.
>>>> That way, we can add all test categories to the default Ant run.
>>>> However, this would take a large amount of time to run (I've tried it
>>>> once,
>>>> and killed the process after several days), which brings us to your next
>>>> point:
>>>>
>>>> Part of the problem may be time to run the tests. I'd like to propose
>>>>
>>>>
>>>>> splitting the tests into two sets:
>>>>>
>>>>> 1. A small set that one would run in addition to the relevant tests,
>>>>> whenever making a small change. It should *not* be based on skipping
>>>>> complete categories, but on doing those tests from each category that
>>>>> are
>>>>> most likely to detect regression, especially regression due to changes
>>>>> in
>>>>> other areas.
>>>>>
>>>>>
>>>>>
>>>> Completely agree. However, most of the QA tests are not clear unit or
>>>> regression tests. They are more integration/conformance tests that test
>>>> the
>>>> requirements of the spec and its implementation.
>>>> Identifying the list of "right" tests to run as part of the small set you
>>>> mention would require going through all 1059 test descriptions and their
>>>> sources.
>>>>
>>>> 2. A full test set that may take a lot longer. In many projects, there is
>>>> a
>>>>
>>>>
>>>>> "nightly build" and a test sequence that is run against that build. That
>>>>> test sequence can take up to 24 hours to run, and should be as complete
>>>>> as
>>>>> possible. Does Apache have infrastructure to support this sort of
>>>>> operation?
>>>>>
>>>>>
>>>>>
>>>> Again, completely agree. I'm sure Apache supports this through Hudson. We
>>>> could request to setup a second build job, doing nightly builds and
>>>> running
>>>> the whole test suite. Think this is the only way to make running the
>>>> complete QA suite automatically practical.
>>>>
>>>>
>>>>
>>>>
>>>>> Are there any tests that people *know* should not run? I'm thinking of
>>>>> running the lot just to see what happens, but knowing ones that are not
>>>>> expected to work would help with result interpretation.
>>>>>
>>>>>
>>>>>
>>>> See above, tests of that type should have already been tagged to be
>>>> skipped
>>>> by the good people that donated this test suite.
>>>> I've noticed that usually, when a SkipTestVerifier is used in a .td file,
>>>> someone has put some comments in there to explain why it was tagged as
>>>> such.
>>>>
>>>>
>>>>
>>>>
>>>>> Patricia
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
> 


Mime
View raw message