hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Planning to roll the 0.98.4 RC on 6/30
Date Thu, 26 Jun 2014 22:59:20 GMT
Additionally we run unit tests in parallel to reduce the total time
required for test suite execution. Surefire will fork multiple JVMs,
dynamically generate test jars containing a subset of tests, and run them.
That can make isolating hanging tests difficult but this behavior can be
influenced by defines on the Maven command line. For example, to fork a
process for every single unit test:

    mvn test -Dsurefire.firstPartForkMode=always
-Dsurefire.secondPartForkMode=always

And then if you find a hanging surefire runner, you can dump thread stacks
of that JVM and know only the unit test you find methods of in the stacks
contributed to the current wedged state.


On Thu, Jun 26, 2014 at 3:48 PM, Andrew Purtell <apurtell@apache.org> wrote:

> Java 7u60 64-bit on an EC2 m3.4xlarge. Just running the unit test suite in
> a loop. I don't set any special Maven options in MVN_OPTS or anything like
> that.
>
> Historically failures that occur when the suite executes but do not when
> individual tests pass happen because one test does not shut down in a
> timely manner, or at all, and a subsequent test might use the same
> hardcoded path or port. When that happens we have a sporadic and sometimes
> load sensitive failure. Complicating, each time one clones a repository on
> a different host or file filesystem JUnit may pick up a different test
> order, influenced by whatever readdir hands back for each package.
>
>
>
>
> On Thu, Jun 26, 2014 at 3:25 PM, Mikhail Antonov <olorinbant@gmail.com>
> wrote:
>
>> Andrew,
>>
>> Could you share some details - on what env. you're running the tests, and
>> at which point do that fail? I'm curious because of lately I'm seeing
>> weird
>> failures on current master too, which do not happen on hadoop-qa -
>>  individual tests always pass, but when running the suite tests either get
>> stuck and time out (in roughly the same point), or fail with NPE or
>> PermGen
>> exception. I've been blaming my environment first, but may be it's
>> something related.
>>
>> -Mikhail
>>
>>
>>
>>
>> 2014-06-26 13:39 GMT-07:00 Andrew Purtell <apurtell@apache.org>:
>>
>> > I'm finding that repeated runs of the unit test suite at the head of
>> branch
>> > 0.98 intermittently fail. Individual tests do not, so this likely a
>> lagging
>> > shutdown, port/resource conflict, and/or zombie test issue. I am
>> currently
>> > bisecting commits on 0.98 branch since the last release in the hope of
>> > pinning this down to a single change. Depending on how quickly that can
>> > happen, the RC might happen on Monday or not. As things stand at the
>> head
>> > of the branch, I'd not +1 the RC given the release criteria I've been
>> using
>> > up to now.
>>
>
-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message