ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Novogrodsky <david.novogrod...@gmail.com>
Subject Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6
Date Wed, 17 Dec 2014 15:05:11 GMT
I hope this clears up the confusion:

First, this is what the hosts file looks like:
127.0.0.1   localhost localhost.localdomain localhost4
localhost4.localdomain4
::1         localhost localhost.localdomain localhost6
localhost6.localdomain6
192.168.200.144 datanode10.localdomain.com
192.168.200.143 namenode.localdomain.com namenode
192.168.200.107 datanode01.localdomain.com

I have re-started all the nodes in the cluster. Several times.
I can reach all nodes from all nodes using ping and their fully qualified
domain names
I can reach the data nodes from the name node using password-less ssh.
I have changed the name of the machines to match their names in the hosts
files on each machine.
I have checked the /etc/ambari-agnet/conf/ambari-agent.ini file.

David Novogrodsky
david.novogrodsky@gmail.com
http://www.linkedin.com/in/davidnovogrodsky

On Tue, Dec 16, 2014 at 10:20 PM, Devopam Mittra <devopam@gmail.com> wrote:
>
> forgive me if i sound rude , but please re-read the installation
> instructions properly - it should help you in your case positively.
>
> 1. have a sound naming convention for all your boxes. e.g.:
> namenode01.localdomain , datanode01.localdomain , datanode0N.localdomain ,
> this will help you much in your future expansion and maintenance of your
> cluster
> 2. do not , by any means , tamper with /etc/hosts for 127.0.0.1 and ::1 ,
> let it be localhost keyword only as you don't want to change that in the
> first place ... so don't play around with that one . it will help you to
> otherwise maintain normal operations on your box as well , otherwise for
> every internal lookup of OS functions it will only create issues
> 3. if you have a DHCP + very good DNS server in place, then okay , else ,
> assign static IPs to your machines and create one entry for each box with
> the FQDN and static IP address , replicated on ALL the boxes
> 4. set up keyless ssh login for root or any other uniform localuser that
> you want to use and manage ambari + hadoop
> 5. confirm that namenode and the ambari server machines (in case they are
> different for you) can talk to ALL the machines using a keyless login for
> that universal user you have created in above steps.
>
> hope the above will help you to sort out the issue in a single go.
>
> regards
> Dev
>
>
>
> On Tue, Dec 16, 2014 at 11:45 PM, David Novogrodsky <
> david.novogrodsky@gmail.com> wrote:
>>
>> There is nothing simply done in Ambari.  :)
>>
>> By changing the name of this computer and restarting the namenode  Ambari
>> does not recogize any node.  The main error I am wondering about is this:
>> INFO 2014-12-16 12:02:29,669 main.py:233 - Connecting to Ambari server at
>> https://namenode.localdomain:8440 (98.124.198.1)
>> INFO 2014-12-16 12:02:29,670 NetUtil.py:48 - Connecting to
>> https://namenode.localdomain:8440/ca
>> WARNING 2014-12-16 12:02:29,718 NetUtil.py:71 - Failed to connect to
>> https://namenode.localdomain:8440/ca due to [Errno 111] Connection
>> refused
>> WARNING 2014-12-16 12:02:29,719 NetUtil.py:92 - Server at
>> https://namenode.localdomain:8440 is not reachable, sleeping for 10
>> seconds...
>> ', None)
>> Why is Ambari using namenode.localdomain to connect?
>>
>> I am running Ambari on this node; I am running Ambari on the namenode of
>> this cluster.  The host file for this computer is this:
>>   GNU nano 2.0.9              File:
>> /etc/hosts
>>
>> 127.0.0.1   localhost localhost.localdomain localhost4
>> localhost4.localdomain4
>> ::1         localhost localhost.localdomain localhost6
>> localhost6.localdomain6
>> 192.168.200.144 localhost.datanode10
>> 192.168.200.107 localhost.datanode01
>> 192.168.200.143 namenode.localdomain.com namenode
>>
>> The Ambari wizard said I needed to use fully qualified domain names, so
>>
>> What follows is a detailed log of the registration log.  I get this error
>> in the registration log for namenode.localdomain.com:
>> --
>> ==========================
>> Creating target directory...
>> ==========================
>>
>> Command start time 2014-12-16 12:02:18
>>
>> Connection to namenode.localdomain.com closed.
>> SSH command execution finished
>> host=namenode.localdomain.com, exitcode=0
>> Command end time 2014-12-16 12:02:18
>>
>> ==========================
>> Copying common functions script...
>> ==========================
>>
>> Command start time 2014-12-16 12:02:18
>>
>> scp /usr/lib/python2.6/site-packages/ambari_commons
>> host=namenode.localdomain.com, exitcode=0
>> Command end time 2014-12-16 12:02:18
>>
>> ==========================
>> Copying OS type check script...
>> ==========================
>>
>> Command start time 2014-12-16 12:02:18
>>
>> scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py
>> host=namenode.localdomain.com, exitcode=0
>> Command end time 2014-12-16 12:02:18
>>
>> ==========================
>> Running OS type check...
>> ==========================
>>
>> Command start time 2014-12-16 12:02:18
>> Cluster primary/cluster OS type is redhat6 and local/current OS type is
>> redhat6
>>
>> Connection to namenode.localdomain.com closed.
>> SSH command execution finished
>> host=namenode.localdomain.com, exitcode=0
>> Command end time 2014-12-16 12:02:19
>>
>> ==========================
>> Checking 'sudo' package on remote host...
>> ==========================
>>
>> Command start time 2014-12-16 12:02:19
>> sudo-1.8.6p3-15.el6.x86_64
>>
>> Connection to namenode.localdomain.com closed.
>> SSH command execution finished
>> host=namenode.localdomain.com, exitcode=0
>> Command end time 2014-12-16 12:02:20
>>
>> ==========================
>> Copying repo file to 'tmp' folder...
>> ==========================
>>
>> Command start time 2014-12-16 12:02:20
>>
>> scp /etc/yum.repos.d/ambari.repo
>> host=namenode.localdomain.com, exitcode=0
>> Command end time 2014-12-16 12:02:20
>>
>> ==========================
>> Moving file to repo dir...
>> ==========================
>>
>> Command start time 2014-12-16 12:02:20
>>
>> Connection to namenode.localdomain.com closed.
>> SSH command execution finished
>> host=namenode.localdomain.com, exitcode=0
>> Command end time 2014-12-16 12:02:21
>>
>> ==========================
>> Copying setup script file...
>> ==========================
>>
>> Command start time 2014-12-16 12:02:21
>>
>> scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py
>> host=namenode.localdomain.com, exitcode=0
>> Command end time 2014-12-16 12:02:21
>>
>> ==========================
>> Running setup agent script...
>> ==========================
>>
>> Command start time 2014-12-16 12:02:21
>> Verifying Python version compatibility...
>> Using python  /usr/bin/python2.6
>> Found ambari-agent PID: 5036
>> Stopping ambari-agent
>> Removing PID file at /var/run/ambari-agent/ambari-agent.pid
>> ambari-agent successfully stopped
>> Restarting ambari-agent
>> Verifying Python version compatibility...
>> Using python  /usr/bin/python2.6
>> ambari-agent is not running. No PID found at
>> /var/run/ambari-agent/ambari-agent.pid
>> Verifying Python version compatibility...
>> Using python  /usr/bin/python2.6
>> Checking for previously running Ambari Agent...
>> Starting ambari-agent
>> Verifying ambari-agent process status...
>> Ambari Agent successfully started
>> Agent PID at: /var/run/ambari-agent/ambari-agent.pid
>> Agent out at: /var/log/ambari-agent/ambari-agent.out
>> Agent log at: /var/log/ambari-agent/ambari-agent.log
>> ('WARNING 2014-12-16 12:01:59,642 NetUtil.py:92 - Server at
>> https://namenode.localdomain:8440 is not reachable, sleeping for 10
>> seconds...
>> INFO 2014-12-16 12:02:09,653 NetUtil.py:48 - Connecting to
>> https://namenode.localdomain:8440/ca
>> WARNING 2014-12-16 12:02:09,701 NetUtil.py:71 - Failed to connect to
>> https://namenode.localdomain:8440/ca due to [Errno 111] Connection
>> refused
>> WARNING 2014-12-16 12:02:09,701 NetUtil.py:92 - Server at
>> https://namenode.localdomain:8440 is not reachable, sleeping for 10
>> seconds...
>> INFO 2014-12-16 12:02:19,711 NetUtil.py:48 - Connecting to
>> https://namenode.localdomain:8440/ca
>> WARNING 2014-12-16 12:02:19,770 NetUtil.py:71 - Failed to connect to
>> https://namenode.localdomain:8440/ca due to [Errno 111] Connection
>> refused
>> WARNING 2014-12-16 12:02:19,770 NetUtil.py:92 - Server at
>> https://namenode.localdomain:8440 is not reachable, sleeping for 10
>> seconds...
>> INFO 2014-12-16 12:02:22,680 main.py:83 - loglevel=logging.INFO
>> INFO 2014-12-16 12:02:22,681 main.py:55 - signal received, exiting.
>> INFO 2014-12-16 12:02:22,681 ProcessHelper.py:39 - Removing pid file
>> INFO 2014-12-16 12:02:22,681 ProcessHelper.py:46 - Removing temp files
>> INFO 2014-12-16 12:02:29,532 main.py:83 - loglevel=logging.INFO
>> INFO 2014-12-16 12:02:29,533 DataCleaner.py:36 - Data cleanup thread
>> started
>> INFO 2014-12-16 12:02:29,534 DataCleaner.py:117 - Data cleanup started
>> INFO 2014-12-16 12:02:29,542 DataCleaner.py:119 - Data cleanup finished
>> INFO 2014-12-16 12:02:29,667 PingPortListener.py:51 - Ping port listener
>> started on port: 8670
>> INFO 2014-12-16 12:02:29,669 main.py:233 - Connecting to Ambari server at
>> https://namenode.localdomain:8440 (98.124.198.1)
>> INFO 2014-12-16 12:02:29,670 NetUtil.py:48 - Connecting to
>> https://namenode.localdomain:8440/ca
>> WARNING 2014-12-16 12:02:29,718 NetUtil.py:71 - Failed to connect to
>> https://namenode.localdomain:8440/ca due to [Errno 111] Connection
>> refused
>> WARNING 2014-12-16 12:02:29,719 NetUtil.py:92 - Server at
>> https://namenode.localdomain:8440 is not reachable, sleeping for 10
>> seconds...
>> ', None)
>>
>> Connection to namenode.localdomain.com closed.
>> SSH command execution finished
>> host=namenode.localdomain.com, exitcode=0
>> Command end time 2014-12-16 12:02:32
>>
>> Registering with the server...
>> Registration with the server failed.
>> ----
>>
>> David Novogrodsky
>> david.novogrodsky@gmail.com
>> http://www.linkedin.com/in/davidnovogrodsky
>>
>> On Mon, Dec 15, 2014 at 10:02 PM, Devopam Mittra <devopam@gmail.com>
>> wrote:
>>>
>>> May I suggest you simply do a ssh -l <keylessusername> using the
>>> previous and the new FQDNs that you have defined to verify which one is in
>>> effect, and accessible ?
>>> Also, since you changed the FQDN, you may wish to simply reboot the
>>> cluster once, just to make sure that new ones are in-place.
>>> It might happen that after the reboot you will need to redo the ssh
>>> keyless pairing once again (most probably)
>>>
>>> regards
>>> Devopam
>>>
>>>
>>> On Tue, Dec 16, 2014 at 4:32 AM, David Novogrodsky <
>>> david.novogrodsky@gmail.com> wrote:
>>>>
>>>> The changes I am making in the hosts file are not being picked up by
>>>> the installation scripts of Ambari.  I was told I could make changes to the
>>>> hosts file and that Ambari would see them.  I have
>>>> checked the etc/ambari-agent/conf/ambari-agent.ini file and the changes
>>>> I made to the hosts file are not showing up in that file.  Where is Ambari
>>>> getting the names for the other nodes in the cluster?
>>>>
>>>> Here are the changes I made to the hosts file on the host for the name
>>>> node:
>>>> 127.0.0.1   localhost localhost.localdomain localhost4
>>>> localhost4.localdomain4
>>>> ::1         localhost localhost.localdomain localhost6
>>>> localhost6.localdomain6
>>>> 192.168.200.144 datanode10.localdomain
>>>> 192.168.200.107 datanode01.localdomain
>>>> 192.168.200.143 namenode.localdomain namenode
>>>>
>>>> Since I made these changes Ambari can not discover any of the nodes in
>>>> the network.  None of them.
>>>>
>>>> I have not made these changes to the other nodes because I do not want
>>>> to make changes to the other nodes until I can see Ambari discover the host
>>>> it is sitting upon.
>>>>
>>>> Regarding the commands you mentioned, here are the results:
>>>> [root@localhost conf]# hostname -f
>>>> hostname: Unknown host
>>>> [root@localhost conf]# hostname
>>>> localhost.namenode
>>>> [root@localhost conf]#  python -c 'import socket; print
>>>> socket.getfqdn()'
>>>> localhost.namenode
>>>>
>>>> localhost.namenode was the name for I used for this host during the
>>>> installation of CentOS.   I thought you said i could make changes to the
>>>> hosts file and the installation scripts would recognize them?
>>>>
>>>> From the Confirm Hosts page I am getting the following errors:
>>>> for connecting to the name node
>>>>
>>>> STDOUT: {'exitstatus': 1, 'log': "Host registration aborted. Ambari Agent
host
>>>> cannot reach Ambari Server 'localhost.namenode:8080'. Please check the network
>>>> connectivity between the Ambari Agent host and the Ambari Server"}
>>>>
>>>> for connecting to the datanode10
>>>>
>>>> INFO 2014-12-15 16:42:33,348 DataCleaner.py:36 - Data cleanup thread started
>>>> ERROR 2014-12-15 16:42:33,349 main.py:137 - Ambari agent machine hostname
>>>>  (localhost.datanode10) does not match expected ambari server hostname
>>>> (datanode10.localdomain). Aborting registration. Please check hostname,
>>>> hostname -f and /etc/hosts file to confirm your hostname is setup correctly
>>>> ', None)
>>>>
>>>> I am getting similiar error when trying to get to the datanode01.
>>>> Please note I used the following domain names for the following datanodes
>>>> when I installed the CentOS
>>>> datanode 10 --> localhost.datanode10
>>>> datanode01 --> localhost.datanode01
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> David Novogrodsky
>>>> david.novogrodsky@gmail.com
>>>> http://www.linkedin.com/in/davidnovogrodsky
>>>>
>>>> On Mon, Dec 15, 2014 at 11:50 AM, Yusaku Sako <yusaku@hortonworks.com>
>>>> wrote:
>>>>>
>>>>> Did you change the FQDNs like I proposed, like namenode.localdomain,
>>>>> rather than localhost.namenode?
>>>>> Did you ensure that the 3 commands returned the results as shown?
>>>>> Can each host resolve all the other hosts by name?
>>>>>
>>>>> If you want to get a cluster up and running on VMs, the best bet is to
>>>>> use:
>>>>> https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide
>>>>>
>>>>> This sets up all /etc/hosts and other settings in the way you want.
>>>>> Then you can see how these VMs are being set up and mimic on your VMs
>>>>> if you'd rather set them up from scratch.
>>>>>
>>>>> I hope this helps.
>>>>> Yusaku
>>>>>
>>>>>
>>>>> On Mon, Dec 15, 2014 at 8:18 AM, David Novogrodsky <
>>>>> david.novogrodsky@gmail.com> wrote:
>>>>>>
>>>>>> Ok, I removed the multiple instances onf localhost.namenode.  It
now
>>>>>> only appears on one line in the hosts file.
>>>>>>
>>>>>> The main ambari server still cannot see the data nodes nor the node
>>>>>> Ambari is on.  Ambari is on the namenode.  When I run the install,
the
>>>>>> install program can not connect to any node in the network.
>>>>>>
>>>>>> Also I tried running /etc/init.d/network restart on one of the nodes;
>>>>>> datanode10 ( a virtual machine).  Now that node cannot connect to
the
>>>>>> internet....I would like to send you the information but I am having
>>>>>> problems setting the document from the virtual machine.
>>>>>>
>>>>>> I do not have a DNS.  These machines have hardwired IP addresses
and
>>>>>> names in the host file. Did runn /etc/init.d/network restart break
the
>>>>>> connection?
>>>>>>
>>>>>>
>>>>>> David Novogrodsky
>>>>>> david.novogrodsky@gmail.com
>>>>>> http://www.linkedin.com/in/davidnovogrodsky
>>>>>>
>>>>>> On Sat, Dec 13, 2014 at 12:46 AM, Yusaku Sako <yusaku@hortonworks.com
>>>>>> > wrote:
>>>>>>>
>>>>>>> You can just make the changes in /etc/hosts.  You might also
>>>>>>> change /etc/sysconfig/network and run /etc/init.d/network restart.
>>>>>>>
>>>>>>> Then make sure that running the 3 commands return expected results.
>>>>>>>
>>>>>>> Yusaku
>>>>>>>
>>>>>>> On Fri, Dec 12, 2014 at 9:06 PM, David Novogrodsky <
>>>>>>> david.novogrodsky@gmail.com> wrote:
>>>>>>>>
>>>>>>>> ​When I installed the CentOS on the machines, I chose those
name,
>>>>>>>> localhost.datanode01...and so on.  You mean I have to reinstall
CentOS on
>>>>>>>> the machines again?
>>>>>>>>
>>>>>>>> Can I just make the changes in the host files?
>>>>>>>>
>>>>>>>> Will I need to recreate the SSH keys?.​
>>>>>>>>
>>>>>>>> David Novogrodsky
>>>>>>>> david.novogrodsky@gmail.com
>>>>>>>> http://www.linkedin.com/in/davidnovogrodsky
>>>>>>>>
>>>>>>>> On Fri, Dec 12, 2014 at 6:21 PM, Yusaku Sako <
>>>>>>>> yusaku@hortonworks.com> wrote:
>>>>>>>>
>>>>>>>>> I would set it up like this:
>>>>>>>>>
>>>>>>>>> 127.0.0.1 localhost localhost.localdomain localhost4
>>>>>>>>> localhost4.localdomain4*   <- do not list the hostname
here. *
>>>>>>>>> ::1 localhost localhost.localdomain localhost6
>>>>>>>>> localhost6.localdomain6
>>>>>>>>> xxx.xxx.200.144 datanode10.localdomain
>>>>>>>>> xxx.xxx.200.107 datanode01.localdomain
>>>>>>>>> xxx.xxx.200.143 namenode.localdomain namenode
>>>>>>>>>
>>>>>>>>> With this change:
>>>>>>>>> * *hostname -f* should display *namenode.localdomain*
>>>>>>>>> * *hostname* should display *namenode*
>>>>>>>>> * *python -c 'import socket; print socket.getfqdn()'
*should
>>>>>>>>> display *namenode.localdomain*
>>>>>>>>>
>>>>>>>>> I hope this helps.
>>>>>>>>> Yusaku
>>>>>>>>>
>>>>>>>>> On Fri, Dec 12, 2014 at 3:52 PM, David Novogrodsky <
>>>>>>>>> david.novogrodsky@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> All,
>>>>>>>>>>
>>>>>>>>>> I am having a problem with Ambari.
>>>>>>>>>> I am trying to use Ambari to install Hadoop to a
three node
>>>>>>>>>> cluster. the name node is where the Ambari server
is located. I am getting
>>>>>>>>>> this error:
>>>>>>>>>> ERROR 2014-12-12 17:39:56,963 main.py:137 – Ambari
agent machine
>>>>>>>>>> hostname (localhost.localdomain) does not match expected
ambari server
>>>>>>>>>> hostname (namenode). Aborting registration. Please
check hostname, hostname
>>>>>>>>>> -f and /etc/hosts file to confirm your hostname is
setup correctly
>>>>>>>>>> ‘, None)
>>>>>>>>>>
>>>>>>>>>> Here is the contents of my hosts file:
>>>>>>>>>> 127.0.0.1 localhost localhost.localdomain localhost4
>>>>>>>>>> localhost4.localdomain4 localhost.namenode namenode
>>>>>>>>>> ::1 localhost localhost.localdomain localhost6
>>>>>>>>>> localhost6.localdomain6
>>>>>>>>>> xxx.xxx.200.144 localhost.datanode10
>>>>>>>>>> xxx.xxx.200.107 localhost.datanode01
>>>>>>>>>> xxx.xxx.200.143 localhost.namenode namenode
>>>>>>>>>>
>>>>>>>>>> I am not sure what the problem is. Since there are
only four
>>>>>>>>>> steps to run ambari there is not a lot of background
to determine the cause
>>>>>>>>>> of this problem.
>>>>>>>>>>
>>>>>>>>>> David Novogrodsky
>>>>>>>>>> david.novogrodsky@gmail.com
>>>>>>>>>> http://www.linkedin.com/in/davidnovogrodsky
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>> NOTICE: This message is intended for the use of the individual
or
>>>>>>>>> entity to which it is addressed and may contain information
that is
>>>>>>>>> confidential, privileged and exempt from disclosure under
applicable law.
>>>>>>>>> If the reader of this message is not the intended recipient,
you are hereby
>>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>>> disclosure or forwarding of this communication is strictly
prohibited. If
>>>>>>>>> you have received this communication in error, please
contact the sender
>>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>>
>>>>>>>>
>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>> NOTICE: This message is intended for the use of the individual
or
>>>>>>> entity to which it is addressed and may contain information that
is
>>>>>>> confidential, privileged and exempt from disclosure under applicable
law.
>>>>>>> If the reader of this message is not the intended recipient,
you are hereby
>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>> disclosure or forwarding of this communication is strictly prohibited.
If
>>>>>>> you have received this communication in error, please contact
the sender
>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>
>>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable
law.
>>>>> If the reader of this message is not the intended recipient, you are
hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited.
If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>>
>>>>
>>>
>>> --
>>> Devopam Mittra
>>> Life and Relations are not binary
>>>
>>
>
> --
> Devopam Mittra
> Life and Relations are not binary
>

Mime
View raw message