incubator-ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xu peng <xupeng.b...@gmail.com>
Subject Re: Problem when setting up hadoop cluster step 2
Date Fri, 17 Aug 2012 04:27:44 GMT
Hitesh Shah :

It is a my my pleasure to fill jira of ambari to help other users . As
a matter of fact, i want to summarize all the problem before i install
ambari cluster successfully. And i will  feed back as soon as
possiable.

Here is another problem i encounter when install hadoop using ambari,
i found a rpm package "hadoop-lzp-native" not in the hdp repo
(baseurl=http://public-repo-1.hortonworks.com/HDP-1.0.13/repos/centos5)
. So i failed againg during deploying step.

And the attachment is the deploying log , please refer.

Thanks a lot and look forward to you reply.


On Tue, Aug 14, 2012 at 11:35 PM, Hitesh Shah <hitesh@hortonworks.com> wrote:
> Ok - the cert issue is sometimes a result of uninstalling and re-installing ambari agents.
>
> The re-install causes ambari agents to regenerate a new certification and if the master
was bootstrapped earlier, it would still be looking to match against old certs.
>
> Stop ambari master and remove ambari-agent rpm from all hosts.
>
> To fix this:
>    - on the master, do a puppet cert revoke for all hosts ( http://docs.puppetlabs.com/man/cert.html
)
>    - you can do a cert list to get all signed or non-signed hosts
>
> On all hosts, delete the following dirs ( if they exist ) :
>    - /etc/puppet/ssl
>    - /etc/puppet/[master|agent\/ssl/
>    - /var/lib/puppet/ssl/
>
>
> After doing the above, re-install the ambari agent.
>
> On the ambari master, stop the master. Run the following command:
>
> puppet master --no-daemonize --debug
>
> The above runs in the foreground. The reason to run this is to make sure the cert for
the master is recreated as we deleted it earlier.
>
> Now, kill the above process running in the foreground and do a service ambari start to
bring up the UI.
>
> You should be able to bootstrap from this point on.
>
> Would you mind filing a jira and mentioning all the various issues you have come across
and how you solved them. We can use that to create an FAQ for other users.
>
> thanks
> -- Hitesh
>
>
> On Aug 14, 2012, at 1:55 AM, xu peng wrote:
>
>> Hi  Hitesh :
>>
>> Thanks a lot for your reply.
>>
>> 1. I did a puppet kick --ping to the client from my ambari master ,
>> all the five nodes failed with the same log (Triggering
>> vbaby2.cloud.eb
>> Host vbaby2.cloud.eb failed: certificate verify failed.  This is often
>> because the time is out of sync on the server or client
>> vbaby2.cloud.eb finished with exit code 2)
>>
>> I manually run "service ambari-agent start" , is that necessary ? How
>> can i fix these problem ?
>>
>> 2. As you suggest , I run the yum command manually. And found that the
>> installation missed some dependecy - php-gd. And i have to update my
>> yum repo.
>>
>>
>>
>> On Tue, Aug 14, 2012 at 1:01 AM, Hitesh Shah <hitesh@hortonworks.com> wrote:
>>> Based on your deploy error log:
>>>
>>> "3": {
>>>        "nodeReport": {
>>>            "PUPPET_KICK_FAILED": [],
>>>            "PUPPET_OPERATION_FAILED": [
>>>                "vbaby3.cloud.eb",
>>>                "vbaby5.cloud.eb",
>>>                "vbaby4.cloud.eb",
>>>                "vbaby2.cloud.eb",
>>>                "vbaby6.cloud.eb",
>>>                "vbaby1.cloud.eb"
>>>            ],
>>>            "PUPPET_OPERATION_TIMEDOUT": [
>>>                "vbaby5.cloud.eb",
>>>                "vbaby4.cloud.eb",
>>>                "vbaby2.cloud.eb",
>>>                "vbaby6.cloud.eb",
>>>                "vbaby1.cloud.eb"
>>>            ],
>>>
>>> 5 nodes timed out which means the puppet agent is not running on them or they
cannot communicate with the master. Trying doing a puppet kick --ping to them from the master.
>>>
>>> For the one which failed, it failed at
>>>
>>> "\"Mon Aug 13 11:54:17 +0800 2012 /Stage[1]/Hdp::Pre_install_pkgs/Hdp::Exec[yum
install $pre_installed_pkgs]/Exec[yum install $pre_installed_pkgs]/returns (err): change from
notrun to 0 failed: yum install -y hadoop hadoop-libhdfs hadoop-native hadoop-pipes hadoop-sbin
hadoop-lzo hadoop hadoop-libhdfs hadoop-native hadoop-pipes hadoop-sbin hadoop-lzo hdp_mon_dashboard
ganglia-gmond-3.2.0 gweb hdp_mon_ganglia_addons snappy snappy-devel returned 1 instead of
one of [0] at /etc/puppet/agent/modules/hdp/manifests/init.pp:265\"",
>>>
>>> It seems like yum install failed on the host. Try running the command manually
and see what the error is.
>>>
>>> -- Hitesh
>>>
>>>
>>>
>>> On Aug 13, 2012, at 2:28 AM, xu peng wrote:
>>>
>>>> Hi Hitesh :
>>>>
>>>> It's me again.
>>>>
>>>> Followed you advice , I reinstalled the ambari server. But deploying
>>>> cluster and uninstall cluster failed again. I really  don't know why.
>>>>
>>>> I supplied a attachment which contains the logs of  all the nodes in
>>>> my cluster (/var/log/puppet_*.log , /var/log/puppet/*.log ,
>>>> /var/log/yum.log, /var/log/hmc/hmc.log). And vbaby3.cloud.eb is the
>>>> ambari server. Please refer.
>>>>
>>>> Attachment DeployError and UninstallError is the log supplied by the
>>>> website of ambari when failing. And attachment DeployingDetails.jpg is
>>>> the deploy details of my cluster. Please refer.
>>>>
>>>>
>>>> Thanks again for your patience ! And look forward to your reply.
>>>>
>>>> Xupeng
>>>>
>>>> On Sat, Aug 11, 2012 at 10:56 PM, Hitesh Shah <hitesh@hortonworks.com>
wrote:
>>>>> For uninstall failures, you will need to do a couple of things. Depending
on where the uninstall failed, you may have to manually do a killall java on all the nodes
to kill any missed processes. If you want to start with a complete clean install, you should
also delete the hadoop dir in the mount points you selected during the previous install  so
that the new fresh install does not face errors when it tries to re-format hdfs.
>>>>>
>>>>> After that, simply, uinstall and re-install ambari rpm and that should
allow you to re-create a fresh cluster.
>>>>>
>>>>> -- Hitesh
>>>>>
>>>>> On Aug 11, 2012, at 2:34 AM, xu peng wrote:
>>>>>
>>>>>> Hi Hitesh :
>>>>>>
>>>>>> Thanks a lot for your reply.
>>>>>>
>>>>>> I solved this problem , it is silly mistake. Someone has changed
the
>>>>>> owner of "/" dir , and according to the errorlog , pdsh need root
to
>>>>>> proceed.
>>>>>>
>>>>>> After changing the owner of "/" to root , problem solved. Thank you
>>>>>> again for you reply.
>>>>>>
>>>>>> I have another question. I had a uninstall failure , and there is
no
>>>>>> button on the website for me to rollback and i don't know what to
do
>>>>>> about that. What should i do now to reinstall hadoop ?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Fri, Aug 10, 2012 at 10:55 PM, Hitesh Shah <hitesh@hortonworks.com>
wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> Currently, the ambari installer requires everything to be run
as root. It does not detect that the user is not root and use sudo either on the master or
on the agent nodes.
>>>>>>> Furthermore, it seems like it is failing when trying to use pdsh
to make remote calls to the host list that you passed in due to the errors mentioned in your
script. This could be due to how it was installed but I am not sure.
>>>>>>>
>>>>>>> Could you switch to become root and run any simple command on
all hosts using pdsh? If you want to reference exactly how ambari uses pdsh, you can look
into /usr/share/hmc/php/frontend/commandUtils.php
>>>>>>>
>>>>>>> thanks
>>>>>>> -- Hitesh
>>>>>>>
>>>>>>> On Aug 9, 2012, at 9:04 PM, xu peng wrote:
>>>>>>>
>>>>>>>> According to the error log , is there something wrong with
my account ?
>>>>>>>>
>>>>>>>> I installed all the dependency module and ambari with the
user
>>>>>>>> "ambari" instead of root. I added user "ambari" to /etc/sudofilers
>>>>>>>> with no passwd.
>>>>>>>>
>>>>>>>> On Fri, Aug 10, 2012 at 11:49 AM, xu peng <xupeng.bupt@gmail.com>
wrote:
>>>>>>>>> There is no 100.log.file in /var/log/hmc dir, but only
55.log file (55
>>>>>>>>> is the biggest version num).
>>>>>>>>>
>>>>>>>>> The content of 55.log is :
>>>>>>>>> pdsh@vbaby1: module path "/usr/lib64/pdsh" insecure.
>>>>>>>>> pdsh@vbaby1: "/": Owner not root, current uid, or pdsh
executable owner
>>>>>>>>> pdsh@vbaby1: Couldn't load any pdsh modules
>>>>>>>>>
>>>>>>>>> Thanks ~
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Aug 10, 2012 at 11:36 AM, Hitesh Shah <hitesh@hortonworks.com>
wrote:
>>>>>>>>>> Sorry - my mistake. The last txn mentioned is 100
so please look for the 100.log file.
>>>>>>>>>>
>>>>>>>>>> -- Hitesh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Aug 9, 2012, at 8:34 PM, Hitesh Shah wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks - will take a look and get back to you.
>>>>>>>>>>>
>>>>>>>>>>> Could you also look at /var/log/hmc/hmc.txn.55.log
and see if there are any errors in it?
>>>>>>>>>>>
>>>>>>>>>>> -- Hitesh.
>>>>>>>>>>>
>>>>>>>>>>> On Aug 9, 2012, at 8:00 PM, xu peng wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Hitesh :
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks a lot for your replying. I have done
all your suggestions in my
>>>>>>>>>>>> ambari server , and the result is as below.
>>>>>>>>>>>>
>>>>>>>>>>>> 1. I can confirm that the hosts.txt file
is empty after i failed at
>>>>>>>>>>>> the step finding reachable nodes.
>>>>>>>>>>>> 2. I tried make hostdetails file in win7
and redhat , it both
>>>>>>>>>>>> failed.(Please see the attachment, my hostdetails
file)
>>>>>>>>>>>> 3. I removed the logging re-direct and run
the .sh script .It seems
>>>>>>>>>>>> like the script works well , it print the
hostname in console and
>>>>>>>>>>>> generate a file (content  is "0") in the
same dir. (Please see the
>>>>>>>>>>>> attachment , the result and my .sh script
)
>>>>>>>>>>>> 4. I attached the hmc.log and error_log too.
Hope this helps ~
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks ~
>>>>>>>>>>>> Xupeng
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Aug 10, 2012 at 12:24 AM, Hitesh
Shah <hitesh@hortonworks.com> wrote:
>>>>>>>>>>>>> Xupeng, can you confirm that the hosts.txt
file at /var/run/hmc/clusters/EBHadoop/hosts.txt is empty?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, can you ensure that the hostdetails
file that you upload does not have any special characters that may be creating problems for
the parsing layer?
>>>>>>>>>>>>>
>>>>>>>>>>>>> In the same dir, there should be an ssh.sh
script. Can you create a copy of it, edit to remove the logging re-directs to files and run
the script manually from command-line ( it takes in a hostname as the argument ) ? The output
of that should show you as to what is going wrong.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, please look at /var/log/hmc/hmc.log
and httpd/error_log to see if there are any errors being logged which may shed more light
on the issue.
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks
>>>>>>>>>>>>> -- Hitesh
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Aug 9, 2012, at 9:11 AM, Artem Ervits
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Which file are you supplying in the
step? Hostdetail.txt or hosts?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> From: xupeng.bupt [mailto:xupeng.bupt@gmail.com]
>>>>>>>>>>>>>> Sent: Thursday, August 09, 2012 11:33
AM
>>>>>>>>>>>>>> To: ambari-user
>>>>>>>>>>>>>> Subject: Re: RE: Problem when setting
up hadoop cluster step 2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you for your replying ~
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I made only one hostdetail.txt file
which contains the names of all servers. And i submit this file on the website ,  but i still
have the same problem. I failed at the step of finding reachable nodes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The error log is : "
>>>>>>>>>>>>>> [ERROR][sequentialScriptExecutor][sequentialScriptRunner.php:272][]:
>>>>>>>>>>>>>> Encountered total failure in transaction
100 while running cmd:
>>>>>>>>>>>>>> /usr/bin/php ./addNodes/findSshableNodes.php
with args: EBHadoop root
>>>>>>>>>>>>>> 35 100 36 /var/run/hmc/clusters/EBHadoop/hosts.txt
>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And my hostdetail.txt file is :"
>>>>>>>>>>>>>> vbaby2.cloud.eb
>>>>>>>>>>>>>> vbaby3.cloud.eb
>>>>>>>>>>>>>> vbaby4.cloud.eb
>>>>>>>>>>>>>> vbaby5.cloud.eb
>>>>>>>>>>>>>> vbaby6.cloud.eb
>>>>>>>>>>>>>> "
>>>>>>>>>>>>>> Thank you very much ~
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2012-08-09
>>>>>>>>>>>>>> xupeng.bupt
>>>>>>>>>>>>>> 发件人: Artem Ervits
>>>>>>>>>>>>>> 发送时间: 2012-08-09  22:16:53
>>>>>>>>>>>>>> 收件人: ambari-user@incubator.apache.org
>>>>>>>>>>>>>> 抄送:
>>>>>>>>>>>>>> 主题: RE: Problem when setting
up hadoop cluster step 2
>>>>>>>>>>>>>> the installer requires a hosts file
which I believe you called hostdetail. Make sure it's the same file. You also mention a hosts.txt
and host.txt. You only need one file with the names of all servers.
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: xu peng [mailto:xupeng.bupt@gmail.com]
>>>>>>>>>>>>>> Sent: Thursday, August 09, 2012 2:02
AM
>>>>>>>>>>>>>> To: ambari-user@incubator.apache.org
>>>>>>>>>>>>>> Subject: Problem when setting up
hadoop cluster step 2
>>>>>>>>>>>>>> Hi everyone :
>>>>>>>>>>>>>> I am trying to use ambari to set
up a hadoop cluster , but i encounter a problem on step 2. I already set up the password-less
ssh, and i creat a hostdetail.txt file.
>>>>>>>>>>>>>> The problem is that i found the file
>>>>>>>>>>>>>> "/var/run/hmc/clusters/EBHadoop/hosts.txt"
is empty , no matter how many times i submit the host.txt file on the website , and i really
don't know why.
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> Here is the log file : [2012:08:09
>>>>>>>>>>>>>> 05:17:56][ERROR][sequentialScriptExecutor][sequentialScriptRunner.php:272][]:
>>>>>>>>>>>>>> Encountered total failure in transaction
100 while running cmd:
>>>>>>>>>>>>>> /usr/bin/php ./addNodes/findSshableNodes.php
with args: EBHadoop root
>>>>>>>>>>>>>> 35 100 36 /var/run/hmc/clusters/EBHadoop/hosts.txt
>>>>>>>>>>>>>> and my host.txt is like this(vbaby1.cloud.eb
is the master node) :
>>>>>>>>>>>>>> vbaby2.cloud.eb
>>>>>>>>>>>>>> vbaby3.cloud.eb
>>>>>>>>>>>>>> vbaby4.cloud.eb
>>>>>>>>>>>>>> vbaby5.cloud.eb
>>>>>>>>>>>>>> vbaby6.cloud.eb
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> Can anyone help me and tell me what
i am doing wrong ?
>>>>>>>>>>>>>> Thank you very much ~!
>>>>>>>>>>>>>> This electronic message is intended
to be for the use only of the named recipient, and may contain information that is confidential
or privileged. If you are not the intended recipient, you are hereby notified that any disclosure,
copying, distribution or use of the contents of this message is strictly prohibited. If you
have received this message in error or are not the named recipient, please notify us immediately
by contacting the sender at the electronic mail address noted above, and delete and destroy
all copies of this message. Thank you.
>>>>>>>>>>>>>> --------------------
>>>>>>>>>>>>>> This electronic message is intended
to be for the use only of the named recipient, and may contain information that is confidential
or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure,
copying, distribution or use of the contents of this message is strictly prohibited.  If you
have received this message in error or are not the named recipient, please notify us immediately
by contacting the sender at the electronic mail address noted above, and delete and destroy
all copies of this message.  Thank you.
>>>>>>>>>>>>>> --------------------
>>>>>>>>>>>>>> This electronic message is intended
to be for the use only of the named recipient, and may contain information that is confidential
or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure,
copying, distribution or use of the contents of this message is strictly prohibited.  If you
have received this message in error or are not the named recipient, please notify us immediately
by contacting the sender at the electronic mail address noted above, and delete and destroy
all copies of this message.  Thank you.
>>>>>>>>>>>>>> --------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This electronic message is intended
to be for the use only of the named recipient, and may contain information that is confidential
or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure,
copying, distribution or use of the contents of this message is strictly prohibited.  If you
have received this message in error or are not the named recipient, please notify us immediately
by contacting the sender at the electronic mail address noted above, and delete and destroy
all copies of this message.  Thank you.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This electronic message is intended
to be for the use only of the named recipient, and may contain information that is confidential
or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure,
copying, distribution or use of the contents of this message is strictly prohibited.  If you
have received this message in error or are not the named recipient, please notify us immediately
by contacting the sender at the electronic mail address noted above, and delete and destroy
all copies of this message.  Thank you.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> <hmcLog.txt><hostdetails.txt><httpdLog.txt><ssh1.sh><ssh1_result.jpg>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>
>>>> <DeployError1_2012.8.13.txt><log.rar><DeployingDetails.jpg><UninstallError1_2012.8.13.txt>
>>>
>

Mime
View raw message