ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sumit Mohanty <smoha...@hortonworks.com>
Subject Re: All processes are waiting during Cluster install
Date Sun, 13 Jul 2014 15:43:58 GMT
By "I restarted the process." do you mean that you restarted installation?

Can you share the command logs for tasks (e.g. 10, 42, 58, etc.)? These
would help debug why the tasks are still active.

If you look at the Ambari UI and look at the past requests (top left) then
the task specific UI will show you the hosts and the local file names on
the host. The files are named as /var/lib/ambari-agent/data/output-10.txt
and /var/lib/ambari-agent/data/errors-10.txt for task id 10.

What I can surmise based on the above is that the agents are still stuck on
executing the older tasks. Thus they cannot execute new commands sent by
Ambari Server when you retried installation. I suggest looking at the
command logs and see why they are stuck. Restarting ambari server may not
help as you may need to restart agents if they are stuck executing the
tasks.

-Sumit


On Sun, Jul 13, 2014 at 8:00 AM, Suraj Nayak M <snayakm@gmail.com> wrote:

> Hi,
>
> I am trying to install HDP2.1 using Ambari on 4 nodes. 2 NN and 2 Slaves.
> The install failed due to python script timeout. I restarted the process.
> From past 2hrs there is no progress in the installation. Is it safe to kill
> the ambari server and restart the process ? How can I terminate the ongoing
> process in Ambari gracefully ?
>
> Below is tail of the Ambari-Server logs.
>
> 20:12:08,530  WARN [qtp527311109-183] HeartBeatHandler:369 - Operation
> failed - may be retried. Service component host: HIVE_CLIENT, host:
> slave2.hdp.somedomain.com Action id1-1
> 20:12:08,530  INFO [qtp527311109-183] HeartBeatHandler:375 - Received
> report for a command that is no longer active. CommandReport{role='HIVE_CLIENT',
> actionId='1-1', status='FAILED', exitCode=999, clusterName='HDP2_CLUSTER1',
> serviceName='HIVE', taskId=57, roleCommand=INSTALL, configurationTags=null,
> customCommand=null}
> 20:12:08,530  WARN [qtp527311109-183] ActionManager:143 - The task 57 is
> not in progress, ignoring update
> 20:12:08,966  WARN [qtp527311109-183] ActionManager:143 - The task 26 is
> not in progress, ignoring update
> 20:12:12,319  WARN [qtp527311109-183] ActionManager:143 - The task 58 is
> not in progress, ignoring update
> 20:12:12,605  WARN [qtp527311109-183] ActionManager:143 - The task 42 is
> not in progress, ignoring update
> 20:12:14,872  WARN [qtp527311109-183] ActionManager:143 - The task 10 is
> not in progress, ignoring update
> 20:12:19,039  WARN [qtp527311109-184] ActionManager:143 - The task 26 is
> not in progress, ignoring update
> 20:12:22,382  WARN [qtp527311109-183] ActionManager:143 - The task 58 is
> not in progress, ignoring update
> 20:12:22,655  WARN [qtp527311109-183] ActionManager:143 - The task 42 is
> not in progress, ignoring update
> 20:12:24,919  WARN [qtp527311109-184] ActionManager:143 - The task 10 is
> not in progress, ignoring update
> 20:12:29,086  WARN [qtp527311109-184] ActionManager:143 - The task 26 is
> not in progress, ignoring update
> 20:12:32,576  WARN [qtp527311109-183] ActionManager:143 - The task 58 is
> not in progress, ignoring update
> 20:12:32,704  WARN [qtp527311109-183] ActionManager:143 - The task 42 is
> not in progress, ignoring update
> 20:12:34,955  WARN [qtp527311109-183] ActionManager:143 - The task 10 is
> not in progress, ignoring update
> 20:12:39,132  WARN [qtp527311109-183] ActionManager:143 - The task 26 is
> not in progress, ignoring update
> 20:12:42,629  WARN [qtp527311109-184] ActionManager:143 - The task 58 is
> not in progress, ignoring update
> 20:12:42,754  WARN [qtp527311109-184] ActionManager:143 - The task 42 is
> not in progress, ignoring update
> 20:12:45,137  WARN [qtp527311109-183] ActionManager:143 - The task 10 is
> not in progress, ignoring update
> 20:12:49,320  WARN [qtp527311109-183] ActionManager:143 - The task 26 is
> not in progress, ignoring update
> 20:12:52,962  WARN [qtp527311109-184] ActionManager:143 - The task 58 is
> not in progress, ignoring update
> 20:12:53,093  WARN [qtp527311109-184] ActionManager:143 - The task 42 is
> not in progress, ignoring update
> 20:12:55,184  WARN [qtp527311109-184] ActionManager:143 - The task 10 is
> not in progress, ignoring update
> 20:12:59,366  WARN [qtp527311109-184] ActionManager:143 - The task 26 is
> not in progress, ignoring update
> 20:13:03,013  WARN [qtp527311109-184] ActionManager:143 - The task 58 is
> not in progress, ignoring update
> 20:13:03,257  WARN [qtp527311109-184] ActionManager:143 - The task 42 is
> not in progress, ignoring update
> 20:13:05,231  WARN [qtp527311109-184] ActionManager:143 - The task 10 is
> not in progress, ignoring update
>
>
> --
> Thanks
> Suraj Nayak
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message