hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1831) Hudson should kill long running tests
Date Wed, 05 Sep 2007 23:59:33 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525258
] 

Jim Kellerman commented on HADOOP-1831:
---------------------------------------

On Wed, 2007-09-05 at 14:21 -0700, Nigel Daley (JIRA) wrote:
> But I think the problem is with Junit.  JUnit is *supposed* to timeout a test if it is

> taking longer than 15 minutes.  This doesn't seem to work reliably if a test gets really
> 'wedged'.

Understood. But how difficult would it be to start a subprocess from the build just prior
to starting a test, and have it monitor the test and kill it if it takes too long?

(See the section "Killing a hung test" at http://wiki.apache.org/lucene-hadoop/HudsonBuildServer
)

Once the test has been killed or if the test exits normally, the subprocess would just exit.
The task that could do this is a pretty simple piece of shell-scripting.

When I have killed just the process running the test manually, the build resumes.

If we did this, I don't think we'd need a timeout on the whole build, because the reason builds
take a long time is due to a hung test.

> Note too that having Hudson timeout a patch build won't have the effect you desire. 

> It will simply hang the patch queue since the 'current' link on the filesystem to the
> patch being tested won't get removed.

I wasn't really suggesting killing the whole build. In my experience just doing a kill -9
on the stuck test kills the test, and the build just resumes.


> Hudson should kill long running tests
> -------------------------------------
>
>                 Key: HADOOP-1831
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1831
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: build
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>             Fix For: 0.15.0
>
>
> Hudson should kill long running tests. (I believe it is supposed to but doesn't quite
seem to do the job if the test is really hung up).
> It would be nice if, when the timer goes off, Hudson did a {code}kill -QUIT{code} (to
try to get a thread dump) and then followed that with a {code}kill -9{code}
> (See the section "Killing a hung test" at http://wiki.apache.org/lucene-hadoop/HudsonBuildServer
)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message