hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-12711) deadly hdfs test
Date Wed, 25 Oct 2017 19:28:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219353#comment-16219353
] 

Allen Wittenauer edited comment on HDFS-12711 at 10/25/17 7:27 PM:
-------------------------------------------------------------------

Re-launched the agent using the Jenkins UI.  But now Jenkins doesn't appear to want to schedule
*any* jobs.

Just shoot me.

... In others news, a bit of "inside baseball" that I bet a lot of people don't know.

When Yetus launches a docker container, Jenkins doesn't know how to kill it.  It sends the
equiv of ctrl-c to the docker CLI but it doesn't seem to respond to it. So the Docker container
*continues to run*. (Thus why "timed out" jenkins jobs will still run if they are running
their docker armor). In qbt mode, there is no JIRA to write to.  So the output is actually
handled by Jenkins.  In test-patch mode, Yetus has a JIRA to write output to.  What we are
seeing is that Yetus is continuing to run, finishes, then says "yeah, a bunch of stuff failed.
 fix your code."  Meanwhile, outside the container, it's death and destruction and the loss
of the Jenkins agent and probably other stuff.

But this does mean at least in this run, that it was *NOT* a kernel panic because otherwise
we would never have gotten any feedback at all.  That's fantastic news because it means there
are likely some controls that can put around it.... just a matter if they are OS-level/infra
or docker-related.

It's worth noting that from what I can tell, surefire will report OOM'd and/or otherwise externally
killed tests as "timed out".  So there was still a lot of death and destruction inside the
container as well.


was (Author: aw):
Re-launched the agent using the Jenkins UI.  But now Jenkins doesn't appear to want to schedule
*any* jobs.

Just shoot me.

... In others news, a bit of "inside baseball" that I bet a lot of people don't know.

When Yetus launches a docker container, Jenkins doesn't know how to kill it.  It sends the
equiv of ctrl-c to the docker CLI but it doesn't seem to respond to it. So the Docker container
*continues to run*. (Thus why "timed out" tasks will still run if they are running their docker
armor). In qbt mode, there is no JIRA to write to.  So the output is actually handled by Jenkins.
 In test-patch mode, Yetus has a JIRA to write output to.  What we are seeing is that Yetus
is continuing to run, finishes, then says "yeah, a bunch of stuff failed.  fix your code."
 Meanwhile, outside the container, it's death and destruction and the loss of the Jenkins
agent and probably other stuff.

But this does mean at least in this run, that it was *NOT* a kernel panic because otherwise
we would never have gotten any feedback at all.  That's fantastic news because it means there
are likely some controls that can put around it.... just a matter if they are OS-level/infra
or docker-related.

It's worth noting that from what I can tell, surefire will report OOM'd and/or otherwise externally
killed tests as "timed out".  So there was still a lot of death and destruction inside the
container as well.

> deadly hdfs test
> ----------------
>
>                 Key: HDFS-12711
>                 URL: https://issues.apache.org/jira/browse/HDFS-12711
>             Project: Hadoop HDFS
>          Issue Type: Test
>    Affects Versions: 2.9.0, 2.8.2
>            Reporter: Allen Wittenauer
>            Priority: Critical
>         Attachments: HDFS-12711.branch-2.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message