incubator-ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Shah <hit...@hortonworks.com>
Subject Re: Problem when setting up hadoop cluster step 2
Date Tue, 21 Aug 2012 15:09:25 GMT
I think removing the ambari and mod_passenger rpms from all the nodes except the ambari master
should suffice. 

-- Hitesh 

On Aug 19, 2012, at 5:46 PM, xu peng wrote:

> No ,  i am not using vbaby1 as my new ambari master. But it is the
> former ambari master (I did not uninstall the dependency package), and
> vbaby3 is the current ambari master.
> 
> So did i have to uninstall all mod_passenge package on the slave node
> ? Or make the ganglia server on the same node as ambari server ?
> 
> 
> On Mon, Aug 20, 2012 at 1:30 AM, Hitesh Shah <hitesh@hortonworks.com> wrote:
>> Yes - not using /dev/mapper/hdvg-rootlv was what I was planning to suggest.
>> 
>> It seems to me that you installed mod_passenger and/or ambari on vbaby1. Is this
your new ambari master?
>> 
>> Try doing this on vbaby1:
>> 
>> $puppet master --no-daemonize --debug
>> 
>> The above will create the cert required by the puppet master running in httpd. Kill
the above process ( it will run in the foreground ).
>> 
>> Now, try the httpd restart.
>> 
>> ( Also, note that you should not need to do anything for ganglia start unless you
are running ganglia server on the same host as ambari. )
>> 
>> thanks
>> -- Hitesh
>> 
>> 
>> On Aug 19, 2012, at 6:03 AM, xu peng wrote:
>> 
>>> Hi Hitesh :
>>> 
>>> It is me again.
>>> 
>>> I figured out the previous problem by changing the mount point to a custom path.
>>> 
>>> But i failed at the step of starting ganglia server.
>>> 
>>> I run this command manualy on vbaby1 node , but failed . The other
>>> node successed.
>>> ([root@vbaby1 log]# service httpd start
>>> Starting httpd: Syntax error on line 37 of /etc/httpd/conf.d/puppetmaster.conf:
>>> SSLCertificateChainFile: file '/var/lib/puppet/ssl/ca/ca_crt.pem' does
>>> not exist or is empty
>>>                                                          [FAILED])
>>> 
>>> Please refer to the error log.
>>> 
>>> Thanks a lot.
>>> 
>>> On Sun, Aug 19, 2012 at 7:26 PM, xu peng <xupeng.bupt@gmail.com> wrote:
>>>> Hi Hitesh :
>>>> 
>>>> It is me again.
>>>> 
>>>> I encountered another problem while deploying the service. And
>>>> according to the log , it seems like something went wrong when
>>>> executing command (Dependency Exec[mkdir -p
>>>> /dev/mapper/hdvg-rootlv/hadoop/hdfs/data] has failures: true) .
>>>> 
>>>> Please refer to the attachment. It seems like all the rpm package
>>>> installed successfully , and i don't know where failed the dependency.
>>>> 
>>>> Please help , thanks a lot.
>>>> 
>>>> On Sun, Aug 19, 2012 at 8:08 AM, xu peng <xupeng.bupt@gmail.com> wrote:
>>>>> Hi Hitesh :
>>>>> 
>>>>> I use the default settings of the mount point , but it seems like this
>>>>> path is not a directory(/dev/mapper/hdvg-rootlv/), and i can not
>>>>> execute mkdir -p command on this path. And the hdvg-rootlv is a
>>>>> blocking file (bwrxwrxwrx) .  Is there something wrong ?
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sun, Aug 19, 2012 at 3:38 AM, Hitesh Shah <hitesh@hortonworks.com>
wrote:
>>>>>> Hi
>>>>>> 
>>>>>> Yes - you should all packages from the new repo and none from the
old repo. Most of the packages should be the same but same like hadoop-lzo were re-factored
to work correctly with respect to 32/64-bit installs on RHEL6.
>>>>>> 
>>>>>> Regarding the mount points, from a hadoop point of view, the namenode
and datanode dirs are just dirs. From a performance point of view, you want each dir to be
created on a separate mount point to increase disk io bandwidth. This means that the mount
points that you select on the UI should allow directories to be created. If you have mounted
certain kind of filesystems which you do not wish to use for hadoop ( any tmpfs, nfs mounts
etc ), you should de-select them on the UI and/or use the custom mount point text box as appropriate.
The UI currently does not distinguish valid mount points and therefore it is up to the user
to select correctly.
>>>>>> 
>>>>>> -- Hitesh
>>>>>> 
>>>>>> 
>>>>>> On Aug 18, 2012, at 9:48 AM, xu peng wrote:
>>>>>> 
>>>>>>> Hi Hitesh:
>>>>>>> 
>>>>>>> Thanks again for your reply.
>>>>>>> 
>>>>>>> I solved the dependency problem after updating the hdp repo.
>>>>>>> 
>>>>>>> But here comes two new problems :
>>>>>>> 1. I update the new hdp repo , but i create a local repo copy
of the
>>>>>>> old hdp repo. And I installed all the rpm package except
>>>>>>> hadoop-lzo-native using the old hdp repo. So it seems like the
>>>>>>> hadoop-lzo-native has some conflct with hadoop-lzo. So , do i
have to
>>>>>>> install all the rpm package from the new repo ?
>>>>>>> 
>>>>>>> 2. From the error log , i can see a command "mkdir -p /var/.../..
>>>>>>> (mounting point of hadoop)", but i found the mouting point is
not a
>>>>>>> dir , but a blocking file(bwrxwrxwrx). And the execution of this
step
>>>>>>> failed. Did i do something wrong ?
>>>>>>> 
>>>>>>> I am sorry that this deploy error log is on my company's computer,
and
>>>>>>> i will upload it in my next email.
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks
>>>>>>> -- Xupeng
>>>>>>> 
>>>>>>> On Sat, Aug 18, 2012 at 4:43 AM, Hitesh Shah <hitesh@hortonworks.com>
wrote:
>>>>>>>> Hi again,
>>>>>>>> 
>>>>>>>> You are actually hitting a problem caused by some changes
in the code which require a modified repo. Unfortunately, I got delayed in modifying the documentation
to point to the new repo.
>>>>>>>> 
>>>>>>>> Could you try using
>>>>>>>> http://public-repo-1.hortonworks.com/HDP-1.0.1.14/repos/centos5/hdp-release-1.0.1.14-1.el5.noarch.rpm
>>>>>>>> or
>>>>>>>> http://public-repo-1.hortonworks.com/HDP-1.0.1.14/repos/centos6/hdp-release-1.0.1.14-1.el6.noarch.rpm
>>>>>>>> 
>>>>>>>> The above should install the yum repo configs to point to
the correct repo which will have the lzo packages.
>>>>>>>> 
>>>>>>>> -- Hitesh
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Aug 16, 2012, at 9:27 PM, xu peng wrote:
>>>>>>>> 
>>>>>>>>> Hitesh Shah :
>>>>>>>>> 
>>>>>>>>> It is a my my pleasure to fill jira of ambari to help
other users . As
>>>>>>>>> a matter of fact, i want to summarize all the problem
before i install
>>>>>>>>> ambari cluster successfully. And i will  feed back as
soon as
>>>>>>>>> possiable.
>>>>>>>>> 
>>>>>>>>> Here is another problem i encounter when install hadoop
using ambari,
>>>>>>>>> i found a rpm package "hadoop-lzp-native" not in the
hdp repo
>>>>>>>>> (baseurl=http://public-repo-1.hortonworks.com/HDP-1.0.13/repos/centos5)
>>>>>>>>> . So i failed againg during deploying step.
>>>>>>>>> 
>>>>>>>>> And the attachment is the deploying log , please refer.
>>>>>>>>> 
>>>>>>>>> Thanks a lot and look forward to you reply.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Aug 14, 2012 at 11:35 PM, Hitesh Shah <hitesh@hortonworks.com>
wrote:
>>>>>>>>>> Ok - the cert issue is sometimes a result of uninstalling
and re-installing ambari agents.
>>>>>>>>>> 
>>>>>>>>>> The re-install causes ambari agents to regenerate
a new certification and if the master was bootstrapped earlier, it would still be looking
to match against old certs.
>>>>>>>>>> 
>>>>>>>>>> Stop ambari master and remove ambari-agent rpm from
all hosts.
>>>>>>>>>> 
>>>>>>>>>> To fix this:
>>>>>>>>>> - on the master, do a puppet cert revoke for all
hosts ( http://docs.puppetlabs.com/man/cert.html )
>>>>>>>>>> - you can do a cert list to get all signed or non-signed
hosts
>>>>>>>>>> 
>>>>>>>>>> On all hosts, delete the following dirs ( if they
exist ) :
>>>>>>>>>> - /etc/puppet/ssl
>>>>>>>>>> - /etc/puppet/[master|agent\/ssl/
>>>>>>>>>> - /var/lib/puppet/ssl/
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> After doing the above, re-install the ambari agent.
>>>>>>>>>> 
>>>>>>>>>> On the ambari master, stop the master. Run the following
command:
>>>>>>>>>> 
>>>>>>>>>> puppet master --no-daemonize --debug
>>>>>>>>>> 
>>>>>>>>>> The above runs in the foreground. The reason to run
this is to make sure the cert for the master is recreated as we deleted it earlier.
>>>>>>>>>> 
>>>>>>>>>> Now, kill the above process running in the foreground
and do a service ambari start to bring up the UI.
>>>>>>>>>> 
>>>>>>>>>> You should be able to bootstrap from this point on.
>>>>>>>>>> 
>>>>>>>>>> Would you mind filing a jira and mentioning all the
various issues you have come across and how you solved them. We can use that to create an
FAQ for other users.
>>>>>>>>>> 
>>>>>>>>>> thanks
>>>>>>>>>> -- Hitesh
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Aug 14, 2012, at 1:55 AM, xu peng wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi  Hitesh :
>>>>>>>>>>> 
>>>>>>>>>>> Thanks a lot for your reply.
>>>>>>>>>>> 
>>>>>>>>>>> 1. I did a puppet kick --ping to the client from
my ambari master ,
>>>>>>>>>>> all the five nodes failed with the same log (Triggering
>>>>>>>>>>> vbaby2.cloud.eb
>>>>>>>>>>> Host vbaby2.cloud.eb failed: certificate verify
failed.  This is often
>>>>>>>>>>> because the time is out of sync on the server
or client
>>>>>>>>>>> vbaby2.cloud.eb finished with exit code 2)
>>>>>>>>>>> 
>>>>>>>>>>> I manually run "service ambari-agent start" ,
is that necessary ? How
>>>>>>>>>>> can i fix these problem ?
>>>>>>>>>>> 
>>>>>>>>>>> 2. As you suggest , I run the yum command manually.
And found that the
>>>>>>>>>>> installation missed some dependecy - php-gd.
And i have to update my
>>>>>>>>>>> yum repo.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Aug 14, 2012 at 1:01 AM, Hitesh Shah
<hitesh@hortonworks.com> wrote:
>>>>>>>>>>>> Based on your deploy error log:
>>>>>>>>>>>> 
>>>>>>>>>>>> "3": {
>>>>>>>>>>>>    "nodeReport": {
>>>>>>>>>>>>        "PUPPET_KICK_FAILED": [],
>>>>>>>>>>>>        "PUPPET_OPERATION_FAILED": [
>>>>>>>>>>>>            "vbaby3.cloud.eb",
>>>>>>>>>>>>            "vbaby5.cloud.eb",
>>>>>>>>>>>>            "vbaby4.cloud.eb",
>>>>>>>>>>>>            "vbaby2.cloud.eb",
>>>>>>>>>>>>            "vbaby6.cloud.eb",
>>>>>>>>>>>>            "vbaby1.cloud.eb"
>>>>>>>>>>>>        ],
>>>>>>>>>>>>        "PUPPET_OPERATION_TIMEDOUT": [
>>>>>>>>>>>>            "vbaby5.cloud.eb",
>>>>>>>>>>>>            "vbaby4.cloud.eb",
>>>>>>>>>>>>            "vbaby2.cloud.eb",
>>>>>>>>>>>>            "vbaby6.cloud.eb",
>>>>>>>>>>>>            "vbaby1.cloud.eb"
>>>>>>>>>>>>        ],
>>>>>>>>>>>> 
>>>>>>>>>>>> 5 nodes timed out which means the puppet
agent is not running on them or they cannot communicate with the master. Trying doing a puppet
kick --ping to them from the master.
>>>>>>>>>>>> 
>>>>>>>>>>>> For the one which failed, it failed at
>>>>>>>>>>>> 
>>>>>>>>>>>> "\"Mon Aug 13 11:54:17 +0800 2012 /Stage[1]/Hdp::Pre_install_pkgs/Hdp::Exec[yum
install $pre_installed_pkgs]/Exec[yum install $pre_installed_pkgs]/returns (err): change from
notrun to 0 failed: yum install -y hadoop hadoop-libhdfs hadoop-native hadoop-pipes hadoop-sbin
hadoop-lzo hadoop hadoop-libhdfs hadoop-native hadoop-pipes hadoop-sbin hadoop-lzo hdp_mon_dashboard
ganglia-gmond-3.2.0 gweb hdp_mon_ganglia_addons snappy snappy-devel returned 1 instead of
one of [0] at /etc/puppet/agent/modules/hdp/manifests/init.pp:265\"",
>>>>>>>>>>>> 
>>>>>>>>>>>> It seems like yum install failed on the host.
Try running the command manually and see what the error is.
>>>>>>>>>>>> 
>>>>>>>>>>>> -- Hitesh
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Aug 13, 2012, at 2:28 AM, xu peng wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Hitesh :
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It's me again.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Followed you advice , I reinstalled the
ambari server. But deploying
>>>>>>>>>>>>> cluster and uninstall cluster failed
again. I really  don't know why.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I supplied a attachment which contains
the logs of  all the nodes in
>>>>>>>>>>>>> my cluster (/var/log/puppet_*.log , /var/log/puppet/*.log
,
>>>>>>>>>>>>> /var/log/yum.log, /var/log/hmc/hmc.log).
And vbaby3.cloud.eb is the
>>>>>>>>>>>>> ambari server. Please refer.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Attachment DeployError and UninstallError
is the log supplied by the
>>>>>>>>>>>>> website of ambari when failing. And attachment
DeployingDetails.jpg is
>>>>>>>>>>>>> the deploy details of my cluster. Please
refer.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks again for your patience ! And
look forward to your reply.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Xupeng
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Sat, Aug 11, 2012 at 10:56 PM, Hitesh
Shah <hitesh@hortonworks.com> wrote:
>>>>>>>>>>>>>> For uninstall failures, you will
need to do a couple of things. Depending on where the uninstall failed, you may have to manually
do a killall java on all the nodes to kill any missed processes. If you want to start with
a complete clean install, you should also delete the hadoop dir in the mount points you selected
during the previous install  so that the new fresh install does not face errors when it tries
to re-format hdfs.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> After that, simply, uinstall and
re-install ambari rpm and that should allow you to re-create a fresh cluster.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -- Hitesh
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Aug 11, 2012, at 2:34 AM, xu peng
wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Hitesh :
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks a lot for your reply.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I solved this problem , it is
silly mistake. Someone has changed the
>>>>>>>>>>>>>>> owner of "/" dir , and according
to the errorlog , pdsh need root to
>>>>>>>>>>>>>>> proceed.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> After changing the owner of "/"
to root , problem solved. Thank you
>>>>>>>>>>>>>>> again for you reply.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I have another question. I had
a uninstall failure , and there is no
>>>>>>>>>>>>>>> button on the website for me
to rollback and i don't know what to do
>>>>>>>>>>>>>>> about that. What should i do
now to reinstall hadoop ?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Aug 10, 2012 at 10:55
PM, Hitesh Shah <hitesh@hortonworks.com> wrote:
>>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Currently, the ambari installer
requires everything to be run as root. It does not detect that the user is not root and use
sudo either on the master or on the agent nodes.
>>>>>>>>>>>>>>>> Furthermore, it seems like
it is failing when trying to use pdsh to make remote calls to the host list that you passed
in due to the errors mentioned in your script. This could be due to how it was installed but
I am not sure.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Could you switch to become
root and run any simple command on all hosts using pdsh? If you want to reference exactly
how ambari uses pdsh, you can look into /usr/share/hmc/php/frontend/commandUtils.php
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> thanks
>>>>>>>>>>>>>>>> -- Hitesh
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Aug 9, 2012, at 9:04 PM,
xu peng wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> According to the error
log , is there something wrong with my account ?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I installed all the dependency
module and ambari with the user
>>>>>>>>>>>>>>>>> "ambari" instead of root.
I added user "ambari" to /etc/sudofilers
>>>>>>>>>>>>>>>>> with no passwd.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Fri, Aug 10, 2012
at 11:49 AM, xu peng <xupeng.bupt@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> There is no 100.log.file
in /var/log/hmc dir, but only 55.log file (55
>>>>>>>>>>>>>>>>>> is the biggest version
num).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The content of 55.log
is :
>>>>>>>>>>>>>>>>>> pdsh@vbaby1: module
path "/usr/lib64/pdsh" insecure.
>>>>>>>>>>>>>>>>>> pdsh@vbaby1: "/":
Owner not root, current uid, or pdsh executable owner
>>>>>>>>>>>>>>>>>> pdsh@vbaby1: Couldn't
load any pdsh modules
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks ~
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, Aug 10, 2012
at 11:36 AM, Hitesh Shah <hitesh@hortonworks.com> wrote:
>>>>>>>>>>>>>>>>>>> Sorry - my mistake.
The last txn mentioned is 100 so please look for the 100.log file.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -- Hitesh
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Aug 9, 2012,
at 8:34 PM, Hitesh Shah wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks -
will take a look and get back to you.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Could you
also look at /var/log/hmc/hmc.txn.55.log and see if there are any errors in it?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> -- Hitesh.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Aug 9,
2012, at 8:00 PM, xu peng wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi Hitesh
:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks
a lot for your replying. I have done all your suggestions in my
>>>>>>>>>>>>>>>>>>>>> ambari
server , and the result is as below.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 1. I
can confirm that the hosts.txt file is empty after i failed at
>>>>>>>>>>>>>>>>>>>>> the step
finding reachable nodes.
>>>>>>>>>>>>>>>>>>>>> 2. I
tried make hostdetails file in win7 and redhat , it both
>>>>>>>>>>>>>>>>>>>>> failed.(Please
see the attachment, my hostdetails file)
>>>>>>>>>>>>>>>>>>>>> 3. I
removed the logging re-direct and run the .sh script .It seems
>>>>>>>>>>>>>>>>>>>>> like
the script works well , it print the hostname in console and
>>>>>>>>>>>>>>>>>>>>> generate
a file (content  is "0") in the same dir. (Please see the
>>>>>>>>>>>>>>>>>>>>> attachment
, the result and my .sh script )
>>>>>>>>>>>>>>>>>>>>> 4. I
attached the hmc.log and error_log too. Hope this helps ~
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks
~
>>>>>>>>>>>>>>>>>>>>> Xupeng
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Fri,
Aug 10, 2012 at 12:24 AM, Hitesh Shah <hitesh@hortonworks.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> Xupeng,
can you confirm that the hosts.txt file at /var/run/hmc/clusters/EBHadoop/hosts.txt is empty?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Also,
can you ensure that the hostdetails file that you upload does not have any special characters
that may be creating problems for the parsing layer?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> In
the same dir, there should be an ssh.sh script. Can you create a copy of it, edit to remove
the logging re-directs to files and run the script manually from command-line ( it takes in
a hostname as the argument ) ? The output of that should show you as to what is going wrong.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Also,
please look at /var/log/hmc/hmc.log and httpd/error_log to see if there are any errors being
logged which may shed more light on the issue.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> thanks
>>>>>>>>>>>>>>>>>>>>>> --
Hitesh
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On
Aug 9, 2012, at 9:11 AM, Artem Ervits wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>
Which file are you supplying in the step? Hostdetail.txt or hosts?
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>
From: xupeng.bupt [mailto:xupeng.bupt@gmail.com]
>>>>>>>>>>>>>>>>>>>>>>>
Sent: Thursday, August 09, 2012 11:33 AM
>>>>>>>>>>>>>>>>>>>>>>>
To: ambari-user
>>>>>>>>>>>>>>>>>>>>>>>
Subject: Re: RE: Problem when setting up hadoop cluster step 2
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>
Thank you for your replying ~
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>
I made only one hostdetail.txt file which contains the names of all servers. And i submit
this file on the website ,  but i still have the same problem. I failed at the step of finding
reachable nodes.
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>
The error log is : "
>>>>>>>>>>>>>>>>>>>>>>>
[ERROR][sequentialScriptExecutor][sequentialScriptRunner.php:272][]:
>>>>>>>>>>>>>>>>>>>>>>>
Encountered total failure in transaction 100 while running cmd:
>>>>>>>>>>>>>>>>>>>>>>>
/usr/bin/php ./addNodes/findSshableNodes.php with args: EBHadoop root
>>>>>>>>>>>>>>>>>>>>>>>
35 100 36 /var/run/hmc/clusters/EBHadoop/hosts.txt
>>>>>>>>>>>>>>>>>>>>>>>
"
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>
And my hostdetail.txt file is :"
>>>>>>>>>>>>>>>>>>>>>>>
vbaby2.cloud.eb
>>>>>>>>>>>>>>>>>>>>>>>
vbaby3.cloud.eb
>>>>>>>>>>>>>>>>>>>>>>>
vbaby4.cloud.eb
>>>>>>>>>>>>>>>>>>>>>>>
vbaby5.cloud.eb
>>>>>>>>>>>>>>>>>>>>>>>
vbaby6.cloud.eb
>>>>>>>>>>>>>>>>>>>>>>>
"
>>>>>>>>>>>>>>>>>>>>>>>
Thank you very much ~
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>
2012-08-09
>>>>>>>>>>>>>>>>>>>>>>>
xupeng.bupt
>>>>>>>>>>>>>>>>>>>>>>>
发件人: Artem Ervits
>>>>>>>>>>>>>>>>>>>>>>>
发送时间: 2012-08-09  22:16:53
>>>>>>>>>>>>>>>>>>>>>>>
收件人: ambari-user@incubator.apache.org
>>>>>>>>>>>>>>>>>>>>>>>
抄送:
>>>>>>>>>>>>>>>>>>>>>>>
主题: RE: Problem when setting up hadoop cluster step 2
>>>>>>>>>>>>>>>>>>>>>>>
the installer requires a hosts file which I believe you called hostdetail. Make sure it's
the same file. You also mention a hosts.txt and host.txt. You only need one file with the
names of all servers.
>>>>>>>>>>>>>>>>>>>>>>>
-----Original Message-----
>>>>>>>>>>>>>>>>>>>>>>>
From: xu peng [mailto:xupeng.bupt@gmail.com]
>>>>>>>>>>>>>>>>>>>>>>>
Sent: Thursday, August 09, 2012 2:02 AM
>>>>>>>>>>>>>>>>>>>>>>>
To: ambari-user@incubator.apache.org
>>>>>>>>>>>>>>>>>>>>>>>
Subject: Problem when setting up hadoop cluster step 2
>>>>>>>>>>>>>>>>>>>>>>>
Hi everyone :
>>>>>>>>>>>>>>>>>>>>>>>
I am trying to use ambari to set up a hadoop cluster , but i encounter a problem on step 2.
I already set up the password-less ssh, and i creat a hostdetail.txt file.
>>>>>>>>>>>>>>>>>>>>>>>
The problem is that i found the file
>>>>>>>>>>>>>>>>>>>>>>>
"/var/run/hmc/clusters/EBHadoop/hosts.txt" is empty , no matter how many times i submit the
host.txt file on the website , and i really don't know why.
>>>>>>>>>>>>>>>>>>>>>>>
{
>>>>>>>>>>>>>>>>>>>>>>>
Here is the log file : [2012:08:09
>>>>>>>>>>>>>>>>>>>>>>>
05:17:56][ERROR][sequentialScriptExecutor][sequentialScriptRunner.php:272][]:
>>>>>>>>>>>>>>>>>>>>>>>
Encountered total failure in transaction 100 while running cmd:
>>>>>>>>>>>>>>>>>>>>>>>
/usr/bin/php ./addNodes/findSshableNodes.php with args: EBHadoop root
>>>>>>>>>>>>>>>>>>>>>>>
35 100 36 /var/run/hmc/clusters/EBHadoop/hosts.txt
>>>>>>>>>>>>>>>>>>>>>>>
and my host.txt is like this(vbaby1.cloud.eb is the master node) :
>>>>>>>>>>>>>>>>>>>>>>>
vbaby2.cloud.eb
>>>>>>>>>>>>>>>>>>>>>>>
vbaby3.cloud.eb
>>>>>>>>>>>>>>>>>>>>>>>
vbaby4.cloud.eb
>>>>>>>>>>>>>>>>>>>>>>>
vbaby5.cloud.eb
>>>>>>>>>>>>>>>>>>>>>>>
vbaby6.cloud.eb
>>>>>>>>>>>>>>>>>>>>>>>
}
>>>>>>>>>>>>>>>>>>>>>>>
Can anyone help me and tell me what i am doing wrong ?
>>>>>>>>>>>>>>>>>>>>>>>
Thank you very much ~!
>>>>>>>>>>>>>>>>>>>>>>>
This electronic message is intended to be for the use only of the named recipient, and may
contain information that is confidential or privileged. If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited. If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message. Thank you.
>>>>>>>>>>>>>>>>>>>>>>>
--------------------
>>>>>>>>>>>>>>>>>>>>>>>
This electronic message is intended to be for the use only of the named recipient, and may
contain information that is confidential or privileged.  If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited.  If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message.  Thank you.
>>>>>>>>>>>>>>>>>>>>>>>
--------------------
>>>>>>>>>>>>>>>>>>>>>>>
This electronic message is intended to be for the use only of the named recipient, and may
contain information that is confidential or privileged.  If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited.  If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message.  Thank you.
>>>>>>>>>>>>>>>>>>>>>>>
--------------------
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>
This electronic message is intended to be for the use only of the named recipient, and may
contain information that is confidential or privileged.  If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited.  If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message.  Thank you.
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>
--------------------
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>
This electronic message is intended to be for the use only of the named recipient, and may
contain information that is confidential or privileged.  If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited.  If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message.  Thank you.
>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> <hmcLog.txt><hostdetails.txt><httpdLog.txt><ssh1.sh><ssh1_result.jpg>
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> <DeployError1_2012.8.13.txt><log.rar><DeployingDetails.jpg><UninstallError1_2012.8.13.txt>
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> <deployError2012.8.17.txt>
>>>>>>>> 
>>>>>> 
>>> <gangliaStartError.txt><4.jpg>
>> 


Mime
View raw message