hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14589) Looking for the surefire-killer; builds being killed...
Date Thu, 29 Oct 2015 17:19:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980839#comment-14980839
] 

stack commented on HBASE-14589:
-------------------------------

Looking for the surefire-killer... what is causing these:

ExecutionException: java.lang.RuntimeException: The forked VM terminated without properly
saying goodbye. VM crash or System.exit called?

Looking at recent fail in 1.3:

{code}
kalashnikov:hbase.git.commit2 stack$ python ./dev-support/findHangingTests.py  https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.3/322/jdk=latest1.7,label=Hadoop/consoleText
Fetching https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.3/322/jdk=latest1.7,label=Hadoop/consoleText
Building remotely on H4 (Mapreduce zookeeper Hadoop Pig falcon Hdfs) in workspace /home/jenkins/jenkins-slave/workspace/HBase-1.3/jdk/latest1.7/label/Hadoop
Printing hanging tests
Hanging test : org.apache.hadoop.hbase.client.TestMetaWithReplicas
Hanging test : org.apache.hadoop.hbase.client.TestHCM
Hanging test : org.apache.hadoop.hbase.client.TestSnapshotFromClientWithRegionReplicas
Hanging test : org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient
{code}

.... I notice that the above 4 hangs don't produce xml files -- just .txt files (was hoping
that an unclosed xml file would help identify the bad tests...).

Also, above are a mix of medium and large tests... first two are medium and latter two are
large.

The above are described as 'hanging' tests...  but all that means is that they were started
but no reported ending....

I see this in the output:

Killed
Killed
Killed
Killed
Killed

So 5 killed but only 4 show as started w/o ending.

Looking at first test, it runs for more than two minutes and doesn't seem to finish properly.
At least two methods take longer than the prescribed medium test time of 50 seconds. Let me
move it to large. None of the tests have timeout. Let me also add category-based timeout (some
of the methods run longer than the medium category sizing of 50 seconds). Hopefully when large
and timeout, failure will bubble up as other than the mysterious surefire exception. Let me
make TestHCM large too.

Looking at TestSnapshotFromClientWithRegionReplicas, it is killed two seconds into the test...
doing:

client.TestSnapshotFromClientWithRegionReplicas#testListTableSnapshotsWithRegex

Trying it locally, it runs nice and promptly.

org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient is cutoff in the middle of testCloneSnapshotOfCloned
after seven seconds. Normally it runs promptly in two and a half minutes.

These tests do spew megabytes of output.











> Looking for the surefire-killer; builds being killed...
> -------------------------------------------------------
>
>                 Key: HBASE-14589
>                 URL: https://issues.apache.org/jira/browse/HBASE-14589
>             Project: HBase
>          Issue Type: Sub-task
>          Components: test
>            Reporter: stack
>            Assignee: stack
>         Attachments: 14589.mx.patch, 14589.timeout.txt, 14589.txt, 14598.addendum.sufire.timeout.patch
>
>
> I see this in a build that started at two hours ago... about 6:45... its build 15941
on ubuntu-6
> {code}
> WARNING: 2 rogue build processes detected, terminating.
> /bin/kill -9 18640 
> /bin/kill -9 22625 
> {code}
> If I back up to build 15939, started about 3 1/2 hours ago, say, 5:15....  I see:
> Running org.apache.hadoop.hbase.client.TestShell
> Killed
> ... but it was running on ubuntu-1.... so it doesn't look like we are killing ourselves...
 when we do this in test-patch.sh
>   ### Kill any rogue build processes from the last attempt
>   $PS auxwww | $GREP ${PROJECT_NAME}PatchProcess | $AWK '{print $2}' | /usr/bin/xargs
-t -I {} /bin/kill -9 {} > /dev/null
> The above code runs in a few places... in test-patch.sh.
> Let me try and add some more info around what is being killed... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message