hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6813) TestAMRMProxy#testE2ETokenRenewal fails sporadically due to race conditions
Date Wed, 12 Jul 2017 19:38:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084559#comment-16084559

Jason Lowe commented on YARN-6813:

I'm not a fan of either approach since it either creates sporadic failures because the timeouts
are too aggressive or the test takes far too long to run.  If the whole point of all this
waiting is to get a token renewed, I'd rather do one or more of the following:
# Add a way for tests to force a token renewal rather than needing to wait some specific amount
of wall clock time
# Programmatically invoke/control the heartbeating for both the NM and the AM in the test
so we aren't needlessly waiting in the test for a heartbeat timer to expire
# Move the relevant pieces over to a controlled clock where we can programmatically speed
up time if necessary to trigger certain time-based events

A typical tell-tale sign for a unit test with race conditions is when it calls Thread.sleep.
 The test is going to sometimes fail because something didn't run fast enough relative to
the sleep or the test runs needlessly longer than it should.  Usually both.

> TestAMRMProxy#testE2ETokenRenewal fails sporadically due to race conditions
> ---------------------------------------------------------------------------
>                 Key: YARN-6813
>                 URL: https://issues.apache.org/jira/browse/YARN-6813
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.8.1
>            Reporter: Jason Lowe
> The testE2ETokenRenewal test lowers the AM and nodemanager heartbeat intervals to only
1.5 seconds.  This leaves very little headroom over the default heartbeat intervals of 1 second.
If the AM hits a hiccup and runs a bit slower than expected the unit test can fail because
the RM expires the AM.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message