hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-12711) deadly hdfs test
Date Wed, 25 Oct 2017 16:40:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219020#comment-16219020
] 

Allen Wittenauer edited comment on HDFS-12711 at 10/25/17 4:39 PM:
-------------------------------------------------------------------

The problem that we're seeing is that if we run the full gamut of Yetus tests (via qbt, so
entire source tree) on branch-2, it invariably hard crashes the node on the ASF build infrastructure.
 Always. Every time.  It also always happen during the hdfs unit tests. I'm attempting to
isolate it to see if it is *just* the HDFS unit tests that trigger the crash or if I need
to run through common, etc, first too.  This will do two things:

* drop the reproducible test case down from 5+ hours to 1+ hours
* confirm that's it is entirely in the hdfs unit tests and not from something else in the
path

My ultimate goal is to get Yetus to configure the Docker container to at least prevent the
crashes.  But I need a 'faster' way to test...

FWIW, I'm still unsure if it is a kernel panic or just breaking Jenkins enough that it thinks
the node is catatonic.  From the *one* time I was able to see logs before they disappeared,
there were a ton of OOM errors, a core dump, and more.  This leads me to believe that it is
likely a kernel panic caused by the OOM killer going nuts, since it's well established how
badly the Linux kernel behaves under low mem.  (Thus why I can't really test at home either...
I'm not using Linux on my "big box"....)


was (Author: aw):
The problem that we're seeing is that if we run the full gamut of Yetus tests on branch-2,
it invariably hard crashes the node on the ASF build infrastructure.  Always. Every time.
 It also always happen during the hdfs unit tests. I'm attempting to isolate it to see if
it is *just* the HDFS unit tests that trigger the crash or if I need to run through common,
etc, first too.  This will do two things:

* drop the reproducible test case down from 5+ hours to 1+ hours
* confirm that's it is entirely in the hdfs unit tests and not from something else in the
path

My ultimate goal is to get Yetus to configure the Docker container to at least prevent the
crashes.  But I need a 'faster' way to test...

FWIW, I'm still unsure if it is a kernel panic or just breaking Jenkins enough that it thinks
the node is catatonic.  From the *one* time I was able to see logs before they disappeared,
there were a ton of OOM errors, a core dump, and more.  This leads me to believe that it is
likely a kernel panic caused by the OOM killer going nuts, since it's well established how
badly the Linux kernel behaves under low mem.  (Thus why I can't really test at home either...
I'm not using Linux on my "big box"....)

> deadly hdfs test
> ----------------
>
>                 Key: HDFS-12711
>                 URL: https://issues.apache.org/jira/browse/HDFS-12711
>             Project: Hadoop HDFS
>          Issue Type: Test
>            Reporter: Allen Wittenauer
>         Attachments: HDFS-12711.branch-2.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message