ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-19416) Ambari agents remain in heartbeat lost state after ambari server restart
Date Mon, 09 Jan 2017 07:31:58 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-19416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810928#comment-15810928
] 

Hadoop QA commented on AMBARI-19416:
------------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12846213/AMBARI-19416.v2.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in ambari-agent.

Test results: https://builds.apache.org/job/Ambari-trunk-test-patch/9957//testReport/
Console output: https://builds.apache.org/job/Ambari-trunk-test-patch/9957//console

This message is automatically generated.

> Ambari agents remain in heartbeat lost state after ambari server restart
> ------------------------------------------------------------------------
>
>                 Key: AMBARI-19416
>                 URL: https://issues.apache.org/jira/browse/AMBARI-19416
>             Project: Ambari
>          Issue Type: Bug
>            Reporter: Sebastian Toader
>            Assignee: Sebastian Toader
>            Priority: Critical
>             Fix For: 3.0.0
>
>         Attachments: AMBARI-19416.v2.patch
>
>
> With the implementation https://issues.apache.org/jira/browse/AMBARI-18505 the execution
of status commands is done in a separate child process. Status commands received from the
server by ambari agent are passed to the status command executor child process via Queue ({{multiprocessing.Queue()}}.
In case the child process is killed, either manually or by the parent process the queue may
end up in bad state (see: http://bugs.python.org/issue20527) thus the re-spawned status command
executor child process may not receive new status commands any more.
> When ambari server is restarted the agent re-registers with ambari server and upon re-registration
it re-spawns the status command child process in order to receive up to date agent configs
(https://issues.apache.org/jira/browse/AMBARI-19392). In this case the status commands won't
be received by the status command executor child process due the queue may get stuck leading
the ambari agent to stay in heatbeat lost state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message