river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Costers <jonathan.cost...@googlemail.com>
Subject Re: ServiceDiscoveryManager test coverage
Date Fri, 27 Aug 2010 17:24:50 GMT
By default, qa.run will log at INFO level (which is what I used for the bulk
run above).

What I do when I need more logging, is specify my own logging.properties
file for that, based upon what is in qa1.logging, and fine tune my logging
settings depending on what I am testing.

in your build.properties:
log.config=/home/jonathan/logging.properties

for instance.

2010/8/27 Patricia Shanahan <pats@acm.org>

> Excellent! Once the servicediscovery regression is fixed that can be added.
>
> Do you run your tests with logging enabled, and if so at what level? I have
> a specific coverage issue involving JoinManager and RetryTask. As far as I
> can tell, we are not testing what happens when a RetryTask has to do a
> Retry, and I believe tasks can get out of order in undesirable ways when
> that happens. If retries are being tests, at the FINEST logging level we
> would see messages from RetryTask containing "retry of".
>
> I would like to know about any tests that produce those messages.
>
> Patricia
>
>
>
>
>
> On 8/27/2010 3:29 AM, Jonathan Costers wrote:
>
>> I just ran the set of tests that are currently being selected when
>> executing
>> the qa.run target, after I added a couple more categories:
>>
>> # of tests started   = 497
>> # of tests completed = 497
>> # of tests skipped   = 21
>> # of tests passed    = 497
>> # of tests failed    = 0
>>
>> -----------------------------------------
>>
>>    Date finished:
>>       Fri Aug 27 12:21:04 CEST 2010
>>    Time elapsed:
>>       27258 seconds
>>
>> BUILD SUCCESSFUL (total time: 454 minutes 20 seconds)
>>
>> The categories that are run are:
>>
>> id,loader,policyprovider,locatordiscovery,activation,config,discoverymanager,joinmanager,url,iiop,jrmp,reliability,thread,renewalmanager,constraint,export,lookupdiscovery
>>
>> Looks like we almost have 50% coverage now (about 500 tests out of 1000+).
>>
>> On my system (an Intel Quad Core with 4GB of memory), this took 7-8 hours
>> to
>> run.
>>
>> 2010/8/27 Patricia Shanahan<pats@acm.org>
>>
>>  That would be ideal. However, an infrequent run of a very large test set
>>> can be managed manually, with check lists.
>>>
>>> Patricia
>>>
>>>
>>>
>>> Jonathan Costers wrote:
>>>
>>>  The QA harness is also supposed to be able to work in distributed mode,
>>>> i.e.
>>>> having multiple machines work together on one test run (splitting the
>>>> work
>>>> so to speak).
>>>> I haven't looked into that feature too much though.
>>>>
>>>> 2010/8/27 Patricia Shanahan<pats@acm.org>
>>>>
>>>>  Based on some experiments, I am convinced a full run may take more than
>>>>
>>>>> 24
>>>>> hours, so even that may have to be selective. Jonathan Costers reports
>>>>> killing a full run after several days. We may need three targets, in
>>>>> addition to problem-specific categories:
>>>>>
>>>>> 1. A quick test that one would do, for example, after checking out and
>>>>> building.
>>>>>
>>>>> 2. A more substantive test that would run in less than 24 hours, to do
>>>>> each
>>>>> day.
>>>>>
>>>>> 3. A complete test that might take several machine-days, and that would
>>>>> be
>>>>> run against a release candidate prior to release.
>>>>>
>>>>> Note that even if a test sequence takes several machine-days, that does
>>>>> not
>>>>> necessarily mean days of elapsed time. Maybe some tests can be run in
>>>>> parallel under the same OS copy. Even if that is not possible, we may
>>>>> be
>>>>> able to gang up several physical or virtual machines, each running a
>>>>> subset
>>>>> of the tests.
>>>>>
>>>>> I think virtual machines may work quite well because a lot of the tests
>>>>> do
>>>>> something then wait around a minute or two to see what happens. They
>>>>> are
>>>>> not
>>>>> very intensive resource users.
>>>>>
>>>>> Patricia
>>>>>
>>>>>
>>>>>
>>>>> Peter Firmstone wrote:
>>>>>
>>>>>  Hi JC,
>>>>>
>>>>>>
>>>>>> Can we have an ant target for running all the tests?
>>>>>>
>>>>>> And how about a qa.run.hudson target?
>>>>>>
>>>>>> I usually use run-categories, to isolate what I'm working on, but
we
>>>>>> definitely need a target that runs everything that should be, even
if
>>>>>> it
>>>>>> does take overnight.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Peter.
>>>>>>
>>>>>> Jonathan Costers wrote:
>>>>>>
>>>>>>  2010/8/24 Patricia Shanahan<pats@acm.org>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  On 8/22/2010 4:57 PM, Peter Firmstone wrote:
>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>  Thanks Patricia, that's very helpful, I'll figure it out
where I
>>>>>>>> went
>>>>>>>>
>>>>>>>>
>>>>>>>>  wrong this week, it really shows the importance of full
test
>>>>>>>>
>>>>>>>>> coverage.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  ...
>>>>>>>>>
>>>>>>>>
>>>>>>>> I strongly agree that test coverage is important. Accordingly,
I've
>>>>>>>> done
>>>>>>>> some analysis of the "ant qa.run" output.
>>>>>>>>
>>>>>>>> There are 1059 test description (*.td) files that exist,
and are
>>>>>>>> loaded
>>>>>>>> at
>>>>>>>> the start of "ant qa.run", but that do not seem to be run.
I've
>>>>>>>> extracted
>>>>>>>> the top level categories from those files:
>>>>>>>>
>>>>>>>> constraint
>>>>>>>> discoveryproviders_impl
>>>>>>>> discoveryservice
>>>>>>>> end2end
>>>>>>>> eventmailbox
>>>>>>>> export_spec
>>>>>>>> io
>>>>>>>> javaspace
>>>>>>>> jeri
>>>>>>>> joinmanager
>>>>>>>> jrmp
>>>>>>>> loader
>>>>>>>> locatordiscovery
>>>>>>>> lookupdiscovery
>>>>>>>> lookupservice
>>>>>>>> proxytrust
>>>>>>>> reliability
>>>>>>>> renewalmanager
>>>>>>>> renewalservice
>>>>>>>> scalability
>>>>>>>> security
>>>>>>>> start
>>>>>>>> txnmanager
>>>>>>>>
>>>>>>>> I'm sure some of these tests are obsolete, duplicates of
tests in
>>>>>>>> categories that are being run, or otherwise inappropriate,
but there
>>>>>>>> does
>>>>>>>> seem to be a rich vein of tests we could mine.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  The QA harness loads all .td files under the "spec" and
"impl"
>>>>>>>>
>>>>>>> directories
>>>>>>> when starting and only witholds the ones that are tagged with
the
>>>>>>> categories
>>>>>>> that we specify from the Ant target.
>>>>>>> Whenever a test is really obsolete or otherwise not supposed
to run,
>>>>>>> it
>>>>>>> is
>>>>>>> marked with a "SkipTestVerifier" in its .td file.
>>>>>>> Most of these are genuine and should be run though.
>>>>>>> There are more categories than the ones you mention above, for
>>>>>>> instance:
>>>>>>> "spec", "id", "id_spec", etc.
>>>>>>> Also, some tests are tagged with multiple categories and as such
>>>>>>> duplicates
>>>>>>> can exist when assembling the list of tests to run.
>>>>>>>
>>>>>>> The reason not all of them are run (by Hudson) now is that we
give a
>>>>>>> specific set of test categories that are known (to me) to run
>>>>>>> smoothly.
>>>>>>> There are many others that are not run (by default) because issue(s)
>>>>>>> are
>>>>>>> present with one or more of the tests in that category.
>>>>>>>
>>>>>>> I completely agree with the fact that we should not exclude complete
>>>>>>> test
>>>>>>> categories because of one test failing.
>>>>>>> What we probably should do is tag any problematic test (due to
>>>>>>> infrastructure or other reasons) with a SkipTestVerifier for
the time
>>>>>>> being
>>>>>>> so that it is not taken into account by the QA harness for now.
>>>>>>> That way, we can add all test categories to the default Ant run.
>>>>>>> However, this would take a large amount of time to run (I've
tried it
>>>>>>> once,
>>>>>>> and killed the process after several days), which brings us to
your
>>>>>>> next
>>>>>>> point:
>>>>>>>
>>>>>>> Part of the problem may be time to run the tests. I'd like to
propose
>>>>>>>
>>>>>>>
>>>>>>>  splitting the tests into two sets:
>>>>>>>
>>>>>>>>
>>>>>>>> 1. A small set that one would run in addition to the relevant
tests,
>>>>>>>> whenever making a small change. It should *not* be based
on skipping
>>>>>>>> complete categories, but on doing those tests from each category
>>>>>>>> that
>>>>>>>> are
>>>>>>>> most likely to detect regression, especially regression due
to
>>>>>>>> changes
>>>>>>>> in
>>>>>>>> other areas.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  Completely agree. However, most of the QA tests are not
clear unit
>>>>>>>> or
>>>>>>>>
>>>>>>> regression tests. They are more integration/conformance tests
that
>>>>>>> test
>>>>>>> the
>>>>>>> requirements of the spec and its implementation.
>>>>>>> Identifying the list of "right" tests to run as part of the small
set
>>>>>>> you
>>>>>>> mention would require going through all 1059 test descriptions
and
>>>>>>> their
>>>>>>> sources.
>>>>>>>
>>>>>>> 2. A full test set that may take a lot longer. In many projects,
>>>>>>> there
>>>>>>> is
>>>>>>> a
>>>>>>>
>>>>>>>
>>>>>>>  "nightly build" and a test sequence that is run against that
build.
>>>>>>>
>>>>>>>> That
>>>>>>>> test sequence can take up to 24 hours to run, and should
be as
>>>>>>>> complete
>>>>>>>> as
>>>>>>>> possible. Does Apache have infrastructure to support this
sort of
>>>>>>>> operation?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  Again, completely agree. I'm sure Apache supports this through
>>>>>>>>
>>>>>>> Hudson. We
>>>>>>> could request to setup a second build job, doing nightly builds
and
>>>>>>> running
>>>>>>> the whole test suite. Think this is the only way to make running
the
>>>>>>> complete QA suite automatically practical.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Are there any tests that people *know* should not run? I'm thinking
>>>>>>> of
>>>>>>>
>>>>>>>> running the lot just to see what happens, but knowing ones
that are
>>>>>>>> not
>>>>>>>> expected to work would help with result interpretation.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  See above, tests of that type should have already been tagged
to be
>>>>>>>>
>>>>>>> skipped
>>>>>>> by the good people that donated this test suite.
>>>>>>> I've noticed that usually, when a SkipTestVerifier is used in
a .td
>>>>>>> file,
>>>>>>> someone has put some comments in there to explain why it was
tagged
>>>>>>> as
>>>>>>> such.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Patricia
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message