ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Nayak M <snay...@gmail.com>
Subject Re: All processes are waiting during Cluster install
Date Sun, 13 Jul 2014 21:36:01 GMT
Now all the process is up and running except one, that is, 
ApplicationHistoryServer.

Attached is the log file of ApplicationHistoryServer. It says 
ClassNotFound Error :

/*java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.server.applicationhistoryservice.timeline.LeveldbTimelineStore

*/Am I missing anything in configuration ?

--
Suraj Nayak

On Monday 14 July 2014 02:04 AM, Suraj Nayak M wrote:
> Sid,
>
> Thanks for your suggestion.
>
> *mysql-connector-java* was the initial error. That was solved after a 
> long wait. (I will try your suggestion in my next install :-) )
> *
> *Below are my attempts to the successful install.
> *
> **Try-1* : Started cluster Install. Few components failed. 
> (mysql-connector-java process was running via agents)
> *Try-2* : Used Retry option from UI. All processes were waiting. After 
> a long time (mysql-connector-java process finished) all the process 
> which were on wait were started. Few components installed successfully 
> and failed due to python script timeout error.
> *Try-3* : Used Retry option from UI. The failed component install 
> succeeded. Again python script timeout during Oozie client 
> install(Screenshot attached in previous mail).
> *Try-4* : Success. (There were some warning due to JAVA_HOME, which am 
> solving now)
>
> Can I increase the timeout period of Python script which was failing 
> often during the install ?
>
> --
> Suraj Nayak
>
> On Monday 14 July 2014 01:29 AM, Siddharth Wagle wrote:
>> Try a yum clean all and a "yum install *mysql-connector-java*" from 
>> command line on the hosts with any HIVE, OOZIE components.
>>
>> Then retry from UI.
>>
>> -Sid
>>
>>
>> On Sun, Jul 13, 2014 at 12:36 PM, Suraj Nayak M <snayakm@gmail.com 
>> <mailto:snayakm@gmail.com>> wrote:
>>
>>     Hi Sumit,
>>
>>     "I restarted the process" meant - I restarted the deployment from
>>     the UI(Using Retry button in the browser).
>>
>>     You were right. The task 10 was stuck at *mysql-connector-java*
>>     installation :)
>>
>>     2014-07-13 20:05:32,755 - Repository['HDP-2.1'] {'action':
>>     ['create'], 'mirror_list': None, 'base_url':
>>     'http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.1.3.0',
>>     'components': ['HDP', 'main'], 'repo_file_name': 'HDP'}
>>     2014-07-13 20:05:32,761 - File['/etc/yum.repos.d/HDP.repo']
>>     {'content': InlineTemplate(...)}
>>     2014-07-13 20:05:32,762 - Package['hive'] {}
>>     2014-07-13 20:05:32,780 - Installing package hive ('/usr/bin/yum
>>     -d 0 -e 0 -y install hive')
>>     2014-07-13 20:08:32,772 - Package['mysql-connector-java'] {}
>>     2014-07-13 20:08:32,802 - Installing package mysql-connector-java
>>     ('/usr/bin/yum -d 0 -e 0 -y install mysql-connector-java')
>>
>>     I also have noticed, if the network is slow, the install succeeds
>>     for few components and fails for few. On retry(from UI), the
>>     install will continue (from the failure point) and the previously
>>     failed component will succeed. Again the cycle continues till all
>>     the components are installed. Is there any way I can increase the
>>     timeout of python script? Or can we have a fix in Ambari for
>>     below condition :
>>
>>      "/*If the error is due to python script timeout, restart the
>>     process*/" ?
>>
>>     The network was slow due to some reason. The installation failed
>>     and the below error was displayed (Screenshot attached)
>>
>>     *Details of error :*
>>
>>     *ERROR :* Python script has been killed due to timeout.
>>
>>     File */var/lib/ambari-agent/data/errors-181.txt* don't contain
>>     any data.
>>
>>     Content of */var/lib/ambari-agent/data/output-181.txt*
>>
>>     2014-07-14 00:07:01,673 - Package['unzip'] {}
>>     2014-07-14 00:07:01,770 - Skipping installing existent package unzip
>>     2014-07-14 00:07:01,772 - Package['curl'] {}
>>     2014-07-14 00:07:01,872 - Skipping installing existent package curl
>>     2014-07-14 00:07:01,874 - Package['net-snmp-utils'] {}
>>     2014-07-14 00:07:01,966 - Skipping installing existent package
>>     net-snmp-utils
>>     2014-07-14 00:07:01,967 - Package['net-snmp'] {}
>>     2014-07-14 00:07:02,060 - Skipping installing existent package
>>     net-snmp
>>     2014-07-14 00:07:02,064 - Group['hadoop'] {}
>>     2014-07-14 00:07:02,069 - Modifying group hadoop
>>     2014-07-14 00:07:02,141 - Group['users'] {}
>>     2014-07-14 00:07:02,142 - Modifying group users
>>     2014-07-14 00:07:02,222 - Group['users'] {}
>>     2014-07-14 00:07:02,224 - Modifying group users
>>     2014-07-14 00:07:02,306 - User['ambari-qa'] {'gid': 'hadoop',
>>     'groups': [u'users']}
>>     2014-07-14 00:07:02,307 - Modifying user ambari-qa
>>     2014-07-14 00:07:02,380 - File['/tmp/changeUid.sh'] {'content':
>>     StaticFile('changeToSecureUid.sh'), 'mode': 0555}
>>     2014-07-14 00:07:02,385 - Execute['/tmp/changeUid.sh ambari-qa
>>     /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa
>>     2>/dev/null'] {'not_if': 'test $(id -u ambari-qa) -gt 1000'}
>>     2014-07-14 00:07:02,454 - Skipping Execute['/tmp/changeUid.sh
>>     ambari-qa
>>     /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa
>>     2>/dev/null'] due to not_if
>>     2014-07-14 00:07:02,456 - User['hbase'] {'gid': 'hadoop',
>>     'groups': [u'hadoop']}
>>     2014-07-14 00:07:02,456 - Modifying user hbase
>>     2014-07-14 00:07:02,528 - File['/tmp/changeUid.sh'] {'content':
>>     StaticFile('changeToSecureUid.sh'), 'mode': 0555}
>>     2014-07-14 00:07:02,531 - Execute['/tmp/changeUid.sh hbase
>>     /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/hadoop/hbase 2>/dev/null']
>>     {'not_if': 'test $(id -u hbase) -gt 1000'}
>>     2014-07-14 00:07:02,600 - Skipping Execute['/tmp/changeUid.sh
>>     hbase
>>     /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/hadoop/hbase 2>/dev/null']
>>     due to not_if
>>     2014-07-14 00:07:02,602 - Group['nagios'] {}
>>     2014-07-14 00:07:02,602 - Modifying group nagios
>>     2014-07-14 00:07:02,687 - User['nagios'] {'gid': 'nagios'}
>>     2014-07-14 00:07:02,689 - Modifying user nagios
>>     2014-07-14 00:07:02,757 - User['oozie'] {'gid': 'hadoop'}
>>     2014-07-14 00:07:02,758 - Modifying user oozie
>>     2014-07-14 00:07:02,826 - User['hcat'] {'gid': 'hadoop'}
>>     2014-07-14 00:07:02,828 - Modifying user hcat
>>     2014-07-14 00:07:02,897 - User['hcat'] {'gid': 'hadoop'}
>>     2014-07-14 00:07:02,898 - Modifying user hcat
>>     2014-07-14 00:07:02,964 - User['hive'] {'gid': 'hadoop'}
>>     2014-07-14 00:07:02,965 - Modifying user hive
>>     2014-07-14 00:07:03,032 - User['yarn'] {'gid': 'hadoop'}
>>     2014-07-14 00:07:03,034 - Modifying user yarn
>>     2014-07-14 00:07:03,099 - Group['nobody'] {}
>>     2014-07-14 00:07:03,100 - Modifying group nobody
>>     2014-07-14 00:07:03,178 - Group['nobody'] {}
>>     2014-07-14 00:07:03,179 - Modifying group nobody
>>     2014-07-14 00:07:03,260 - User['nobody'] {'gid': 'hadoop',
>>     'groups': [u'nobody']}
>>     2014-07-14 00:07:03,261 - Modifying user nobody
>>     2014-07-14 00:07:03,330 - User['nobody'] {'gid': 'hadoop',
>>     'groups': [u'nobody']}
>>     2014-07-14 00:07:03,332 - Modifying user nobody
>>     2014-07-14 00:07:03,401 - User['hdfs'] {'gid': 'hadoop',
>>     'groups': [u'hadoop']}
>>     2014-07-14 00:07:03,403 - Modifying user hdfs
>>     2014-07-14 00:07:03,471 - User['mapred'] {'gid': 'hadoop',
>>     'groups': [u'hadoop']}
>>     2014-07-14 00:07:03,473 - Modifying user mapred
>>     2014-07-14 00:07:03,544 - User['zookeeper'] {'gid': 'hadoop'}
>>     2014-07-14 00:07:03,545 - Modifying user zookeeper
>>     2014-07-14 00:07:03,616 - User['storm'] {'gid': 'hadoop',
>>     'groups': [u'hadoop']}
>>     2014-07-14 00:07:03,618 - Modifying user storm
>>     2014-07-14 00:07:03,688 - User['falcon'] {'gid': 'hadoop',
>>     'groups': [u'hadoop']}
>>     2014-07-14 00:07:03,689 - Modifying user falcon
>>     2014-07-14 00:07:03,758 - User['tez'] {'gid': 'hadoop', 'groups':
>>     [u'users']}
>>     2014-07-14 00:07:03,760 - Modifying user tez
>>     2014-07-14 00:07:04,073 - Repository['HDP-2.1'] {'action':
>>     ['create'], 'mirror_list': None, 'base_url':
>>     'http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.1.3.0',
>>     'components': ['HDP', 'main'], 'repo_file_name': 'HDP'}
>>     2014-07-14 00:07:04,084 - File['/etc/yum.repos.d/HDP.repo']
>>     {'content': InlineTemplate(...)}
>>     2014-07-14 00:07:04,086 - Package['oozie'] {}
>>     2014-07-14 00:07:04,177 - Installing package oozie ('/usr/bin/yum
>>     -d 0 -e 0 -y install oozie')
>>
>>     --
>>     Suraj Nayak
>>
>>
>>     On Sunday 13 July 2014 09:13 PM, Sumit Mohanty wrote:
>>>     By "I restarted the process." do you mean that you restarted
>>>     installation?
>>>
>>>     Can you share the command logs for tasks (e.g. 10, 42, 58,
>>>     etc.)? These would help debug why the tasks are still active.
>>>
>>>     If you look at the Ambari UI and look at the past requests (top
>>>     left) then the task specific UI will show you the hosts and the
>>>     local file names on the host. The files are named as
>>>     /var/lib/ambari-agent/data/output-10.txt and
>>>     /var/lib/ambari-agent/data/errors-10.txt for task id 10.
>>>
>>>     What I can surmise based on the above is that the agents are
>>>     still stuck on executing the older tasks. Thus they cannot
>>>     execute new commands sent by Ambari Server when you retried
>>>     installation. I suggest looking at the command logs and see why
>>>     they are stuck. Restarting ambari server may not help as you may
>>>     need to restart agents if they are stuck executing the tasks.
>>>
>>>     -Sumit
>>>
>>>
>>>     On Sun, Jul 13, 2014 at 8:00 AM, Suraj Nayak M
>>>     <snayakm@gmail.com <mailto:snayakm@gmail.com>> wrote:
>>>
>>>         Hi,
>>>
>>>         I am trying to install HDP2.1 using Ambari on 4 nodes. 2 NN
>>>         and 2 Slaves. The install failed due to python script
>>>         timeout. I restarted the process. From past 2hrs there is no
>>>         progress in the installation. Is it safe to kill the ambari
>>>         server and restart the process ? How can I terminate the
>>>         ongoing process in Ambari gracefully ?
>>>
>>>         Below is tail of the Ambari-Server logs.
>>>
>>>         20:12:08,530  WARN [qtp527311109-183] HeartBeatHandler:369 -
>>>         Operation failed - may be retried. Service component host:
>>>         HIVE_CLIENT, host: slave2.hdp.somedomain.com
>>>         <http://slave2.hdp.somedomain.com> Action id1-1
>>>         20:12:08,530  INFO [qtp527311109-183] HeartBeatHandler:375 -
>>>         Received report for a command that is no longer active.
>>>         CommandReport{role='HIVE_CLIENT', actionId='1-1',
>>>         status='FAILED', exitCode=999, clusterName='HDP2_CLUSTER1',
>>>         serviceName='HIVE', taskId=57, roleCommand=INSTALL,
>>>         configurationTags=null, customCommand=null}
>>>         20:12:08,530  WARN [qtp527311109-183] ActionManager:143 -
>>>         The task 57 is not in progress, ignoring update
>>>         20:12:08,966  WARN [qtp527311109-183] ActionManager:143 -
>>>         The task 26 is not in progress, ignoring update
>>>         20:12:12,319  WARN [qtp527311109-183] ActionManager:143 -
>>>         The task 58 is not in progress, ignoring update
>>>         20:12:12,605  WARN [qtp527311109-183] ActionManager:143 -
>>>         The task 42 is not in progress, ignoring update
>>>         20:12:14,872  WARN [qtp527311109-183] ActionManager:143 -
>>>         The task 10 is not in progress, ignoring update
>>>         20:12:19,039  WARN [qtp527311109-184] ActionManager:143 -
>>>         The task 26 is not in progress, ignoring update
>>>         20:12:22,382  WARN [qtp527311109-183] ActionManager:143 -
>>>         The task 58 is not in progress, ignoring update
>>>         20:12:22,655  WARN [qtp527311109-183] ActionManager:143 -
>>>         The task 42 is not in progress, ignoring update
>>>         20:12:24,919  WARN [qtp527311109-184] ActionManager:143 -
>>>         The task 10 is not in progress, ignoring update
>>>         20:12:29,086  WARN [qtp527311109-184] ActionManager:143 -
>>>         The task 26 is not in progress, ignoring update
>>>         20:12:32,576  WARN [qtp527311109-183] ActionManager:143 -
>>>         The task 58 is not in progress, ignoring update
>>>         20:12:32,704  WARN [qtp527311109-183] ActionManager:143 -
>>>         The task 42 is not in progress, ignoring update
>>>         20:12:34,955  WARN [qtp527311109-183] ActionManager:143 -
>>>         The task 10 is not in progress, ignoring update
>>>         20:12:39,132  WARN [qtp527311109-183] ActionManager:143 -
>>>         The task 26 is not in progress, ignoring update
>>>         20:12:42,629  WARN [qtp527311109-184] ActionManager:143 -
>>>         The task 58 is not in progress, ignoring update
>>>         20:12:42,754  WARN [qtp527311109-184] ActionManager:143 -
>>>         The task 42 is not in progress, ignoring update
>>>         20:12:45,137  WARN [qtp527311109-183] ActionManager:143 -
>>>         The task 10 is not in progress, ignoring update
>>>         20:12:49,320  WARN [qtp527311109-183] ActionManager:143 -
>>>         The task 26 is not in progress, ignoring update
>>>         20:12:52,962  WARN [qtp527311109-184] ActionManager:143 -
>>>         The task 58 is not in progress, ignoring update
>>>         20:12:53,093  WARN [qtp527311109-184] ActionManager:143 -
>>>         The task 42 is not in progress, ignoring update
>>>         20:12:55,184  WARN [qtp527311109-184] ActionManager:143 -
>>>         The task 10 is not in progress, ignoring update
>>>         20:12:59,366  WARN [qtp527311109-184] ActionManager:143 -
>>>         The task 26 is not in progress, ignoring update
>>>         20:13:03,013  WARN [qtp527311109-184] ActionManager:143 -
>>>         The task 58 is not in progress, ignoring update
>>>         20:13:03,257  WARN [qtp527311109-184] ActionManager:143 -
>>>         The task 42 is not in progress, ignoring update
>>>         20:13:05,231  WARN [qtp527311109-184] ActionManager:143 -
>>>         The task 10 is not in progress, ignoring update
>>>
>>>
>>>         --
>>>         Thanks
>>>         Suraj Nayak
>>>
>>>
>>>
>>>     CONFIDENTIALITY NOTICE
>>>     NOTICE: This message is intended for the use of the individual
>>>     or entity to which it is addressed and may contain information
>>>     that is confidential, privileged and exempt from disclosure
>>>     under applicable law. If the reader of this message is not the
>>>     intended recipient, you are hereby notified that any printing,
>>>     copying, dissemination, distribution, disclosure or forwarding
>>>     of this communication is strictly prohibited. If you have
>>>     received this communication in error, please contact the sender
>>>     immediately and delete it from your system. Thank You. 
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or 
>> entity to which it is addressed and may contain information that is 
>> confidential, privileged and exempt from disclosure under applicable 
>> law. If the reader of this message is not the intended recipient, you 
>> are hereby notified that any printing, copying, dissemination, 
>> distribution, disclosure or forwarding of this communication is 
>> strictly prohibited. If you have received this communication in 
>> error, please contact the sender immediately and delete it from your 
>> system. Thank You. 
>


Mime
View raw message