ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devopam Mittra <devo...@gmail.com>
Subject Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6
Date Wed, 17 Dec 2014 04:20:27 GMT
forgive me if i sound rude , but please re-read the installation
instructions properly - it should help you in your case positively.

1. have a sound naming convention for all your boxes. e.g.:
namenode01.localdomain , datanode01.localdomain , datanode0N.localdomain ,
this will help you much in your future expansion and maintenance of your
cluster
2. do not , by any means , tamper with /etc/hosts for 127.0.0.1 and ::1 ,
let it be localhost keyword only as you don't want to change that in the
first place ... so don't play around with that one . it will help you to
otherwise maintain normal operations on your box as well , otherwise for
every internal lookup of OS functions it will only create issues
3. if you have a DHCP + very good DNS server in place, then okay , else ,
assign static IPs to your machines and create one entry for each box with
the FQDN and static IP address , replicated on ALL the boxes
4. set up keyless ssh login for root or any other uniform localuser that
you want to use and manage ambari + hadoop
5. confirm that namenode and the ambari server machines (in case they are
different for you) can talk to ALL the machines using a keyless login for
that universal user you have created in above steps.

hope the above will help you to sort out the issue in a single go.

regards
Dev



On Tue, Dec 16, 2014 at 11:45 PM, David Novogrodsky <
david.novogrodsky@gmail.com> wrote:
>
> There is nothing simply done in Ambari.  :)
>
> By changing the name of this computer and restarting the namenode  Ambari
> does not recogize any node.  The main error I am wondering about is this:
> INFO 2014-12-16 12:02:29,669 main.py:233 - Connecting to Ambari server at
> https://namenode.localdomain:8440 (98.124.198.1)
> INFO 2014-12-16 12:02:29,670 NetUtil.py:48 - Connecting to
> https://namenode.localdomain:8440/ca
> WARNING 2014-12-16 12:02:29,718 NetUtil.py:71 - Failed to connect to
> https://namenode.localdomain:8440/ca due to [Errno 111] Connection
> refused
> WARNING 2014-12-16 12:02:29,719 NetUtil.py:92 - Server at
> https://namenode.localdomain:8440 is not reachable, sleeping for 10
> seconds...
> ', None)
> Why is Ambari using namenode.localdomain to connect?
>
> I am running Ambari on this node; I am running Ambari on the namenode of
> this cluster.  The host file for this computer is this:
>   GNU nano 2.0.9              File:
> /etc/hosts
>
> 127.0.0.1   localhost localhost.localdomain localhost4
> localhost4.localdomain4
> ::1         localhost localhost.localdomain localhost6
> localhost6.localdomain6
> 192.168.200.144 localhost.datanode10
> 192.168.200.107 localhost.datanode01
> 192.168.200.143 namenode.localdomain.com namenode
>
> The Ambari wizard said I needed to use fully qualified domain names, so
>
> What follows is a detailed log of the registration log.  I get this error
> in the registration log for namenode.localdomain.com:
> --
> ==========================
> Creating target directory...
> ==========================
>
> Command start time 2014-12-16 12:02:18
>
> Connection to namenode.localdomain.com closed.
> SSH command execution finished
> host=namenode.localdomain.com, exitcode=0
> Command end time 2014-12-16 12:02:18
>
> ==========================
> Copying common functions script...
> ==========================
>
> Command start time 2014-12-16 12:02:18
>
> scp /usr/lib/python2.6/site-packages/ambari_commons
> host=namenode.localdomain.com, exitcode=0
> Command end time 2014-12-16 12:02:18
>
> ==========================
> Copying OS type check script...
> ==========================
>
> Command start time 2014-12-16 12:02:18
>
> scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py
> host=namenode.localdomain.com, exitcode=0
> Command end time 2014-12-16 12:02:18
>
> ==========================
> Running OS type check...
> ==========================
>
> Command start time 2014-12-16 12:02:18
> Cluster primary/cluster OS type is redhat6 and local/current OS type is
> redhat6
>
> Connection to namenode.localdomain.com closed.
> SSH command execution finished
> host=namenode.localdomain.com, exitcode=0
> Command end time 2014-12-16 12:02:19
>
> ==========================
> Checking 'sudo' package on remote host...
> ==========================
>
> Command start time 2014-12-16 12:02:19
> sudo-1.8.6p3-15.el6.x86_64
>
> Connection to namenode.localdomain.com closed.
> SSH command execution finished
> host=namenode.localdomain.com, exitcode=0
> Command end time 2014-12-16 12:02:20
>
> ==========================
> Copying repo file to 'tmp' folder...
> ==========================
>
> Command start time 2014-12-16 12:02:20
>
> scp /etc/yum.repos.d/ambari.repo
> host=namenode.localdomain.com, exitcode=0
> Command end time 2014-12-16 12:02:20
>
> ==========================
> Moving file to repo dir...
> ==========================
>
> Command start time 2014-12-16 12:02:20
>
> Connection to namenode.localdomain.com closed.
> SSH command execution finished
> host=namenode.localdomain.com, exitcode=0
> Command end time 2014-12-16 12:02:21
>
> ==========================
> Copying setup script file...
> ==========================
>
> Command start time 2014-12-16 12:02:21
>
> scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py
> host=namenode.localdomain.com, exitcode=0
> Command end time 2014-12-16 12:02:21
>
> ==========================
> Running setup agent script...
> ==========================
>
> Command start time 2014-12-16 12:02:21
> Verifying Python version compatibility...
> Using python  /usr/bin/python2.6
> Found ambari-agent PID: 5036
> Stopping ambari-agent
> Removing PID file at /var/run/ambari-agent/ambari-agent.pid
> ambari-agent successfully stopped
> Restarting ambari-agent
> Verifying Python version compatibility...
> Using python  /usr/bin/python2.6
> ambari-agent is not running. No PID found at
> /var/run/ambari-agent/ambari-agent.pid
> Verifying Python version compatibility...
> Using python  /usr/bin/python2.6
> Checking for previously running Ambari Agent...
> Starting ambari-agent
> Verifying ambari-agent process status...
> Ambari Agent successfully started
> Agent PID at: /var/run/ambari-agent/ambari-agent.pid
> Agent out at: /var/log/ambari-agent/ambari-agent.out
> Agent log at: /var/log/ambari-agent/ambari-agent.log
> ('WARNING 2014-12-16 12:01:59,642 NetUtil.py:92 - Server at
> https://namenode.localdomain:8440 is not reachable, sleeping for 10
> seconds...
> INFO 2014-12-16 12:02:09,653 NetUtil.py:48 - Connecting to
> https://namenode.localdomain:8440/ca
> WARNING 2014-12-16 12:02:09,701 NetUtil.py:71 - Failed to connect to
> https://namenode.localdomain:8440/ca due to [Errno 111] Connection
> refused
> WARNING 2014-12-16 12:02:09,701 NetUtil.py:92 - Server at
> https://namenode.localdomain:8440 is not reachable, sleeping for 10
> seconds...
> INFO 2014-12-16 12:02:19,711 NetUtil.py:48 - Connecting to
> https://namenode.localdomain:8440/ca
> WARNING 2014-12-16 12:02:19,770 NetUtil.py:71 - Failed to connect to
> https://namenode.localdomain:8440/ca due to [Errno 111] Connection
> refused
> WARNING 2014-12-16 12:02:19,770 NetUtil.py:92 - Server at
> https://namenode.localdomain:8440 is not reachable, sleeping for 10
> seconds...
> INFO 2014-12-16 12:02:22,680 main.py:83 - loglevel=logging.INFO
> INFO 2014-12-16 12:02:22,681 main.py:55 - signal received, exiting.
> INFO 2014-12-16 12:02:22,681 ProcessHelper.py:39 - Removing pid file
> INFO 2014-12-16 12:02:22,681 ProcessHelper.py:46 - Removing temp files
> INFO 2014-12-16 12:02:29,532 main.py:83 - loglevel=logging.INFO
> INFO 2014-12-16 12:02:29,533 DataCleaner.py:36 - Data cleanup thread
> started
> INFO 2014-12-16 12:02:29,534 DataCleaner.py:117 - Data cleanup started
> INFO 2014-12-16 12:02:29,542 DataCleaner.py:119 - Data cleanup finished
> INFO 2014-12-16 12:02:29,667 PingPortListener.py:51 - Ping port listener
> started on port: 8670
> INFO 2014-12-16 12:02:29,669 main.py:233 - Connecting to Ambari server at
> https://namenode.localdomain:8440 (98.124.198.1)
> INFO 2014-12-16 12:02:29,670 NetUtil.py:48 - Connecting to
> https://namenode.localdomain:8440/ca
> WARNING 2014-12-16 12:02:29,718 NetUtil.py:71 - Failed to connect to
> https://namenode.localdomain:8440/ca due to [Errno 111] Connection
> refused
> WARNING 2014-12-16 12:02:29,719 NetUtil.py:92 - Server at
> https://namenode.localdomain:8440 is not reachable, sleeping for 10
> seconds...
> ', None)
>
> Connection to namenode.localdomain.com closed.
> SSH command execution finished
> host=namenode.localdomain.com, exitcode=0
> Command end time 2014-12-16 12:02:32
>
> Registering with the server...
> Registration with the server failed.
> ----
>
> David Novogrodsky
> david.novogrodsky@gmail.com
> http://www.linkedin.com/in/davidnovogrodsky
>
> On Mon, Dec 15, 2014 at 10:02 PM, Devopam Mittra <devopam@gmail.com>
> wrote:
>>
>> May I suggest you simply do a ssh -l <keylessusername> using the previous
>> and the new FQDNs that you have defined to verify which one is in effect,
>> and accessible ?
>> Also, since you changed the FQDN, you may wish to simply reboot the
>> cluster once, just to make sure that new ones are in-place.
>> It might happen that after the reboot you will need to redo the ssh
>> keyless pairing once again (most probably)
>>
>> regards
>> Devopam
>>
>>
>> On Tue, Dec 16, 2014 at 4:32 AM, David Novogrodsky <
>> david.novogrodsky@gmail.com> wrote:
>>>
>>> The changes I am making in the hosts file are not being picked up by the
>>> installation scripts of Ambari.  I was told I could make changes to the
>>> hosts file and that Ambari would see them.  I have
>>> checked the etc/ambari-agent/conf/ambari-agent.ini file and the changes
>>> I made to the hosts file are not showing up in that file.  Where is Ambari
>>> getting the names for the other nodes in the cluster?
>>>
>>> Here are the changes I made to the hosts file on the host for the name
>>> node:
>>> 127.0.0.1   localhost localhost.localdomain localhost4
>>> localhost4.localdomain4
>>> ::1         localhost localhost.localdomain localhost6
>>> localhost6.localdomain6
>>> 192.168.200.144 datanode10.localdomain
>>> 192.168.200.107 datanode01.localdomain
>>> 192.168.200.143 namenode.localdomain namenode
>>>
>>> Since I made these changes Ambari can not discover any of the nodes in
>>> the network.  None of them.
>>>
>>> I have not made these changes to the other nodes because I do not want
>>> to make changes to the other nodes until I can see Ambari discover the host
>>> it is sitting upon.
>>>
>>> Regarding the commands you mentioned, here are the results:
>>> [root@localhost conf]# hostname -f
>>> hostname: Unknown host
>>> [root@localhost conf]# hostname
>>> localhost.namenode
>>> [root@localhost conf]#  python -c 'import socket; print
>>> socket.getfqdn()'
>>> localhost.namenode
>>>
>>> localhost.namenode was the name for I used for this host during the
>>> installation of CentOS.   I thought you said i could make changes to the
>>> hosts file and the installation scripts would recognize them?
>>>
>>> From the Confirm Hosts page I am getting the following errors:
>>> for connecting to the name node
>>>
>>> STDOUT: {'exitstatus': 1, 'log': "Host registration aborted. Ambari Agent host
>>> cannot reach Ambari Server 'localhost.namenode:8080'. Please check the network
>>> connectivity between the Ambari Agent host and the Ambari Server"}
>>>
>>> for connecting to the datanode10
>>>
>>> INFO 2014-12-15 16:42:33,348 DataCleaner.py:36 - Data cleanup thread started
>>> ERROR 2014-12-15 16:42:33,349 main.py:137 - Ambari agent machine hostname
>>>  (localhost.datanode10) does not match expected ambari server hostname
>>> (datanode10.localdomain). Aborting registration. Please check hostname,
>>> hostname -f and /etc/hosts file to confirm your hostname is setup correctly
>>> ', None)
>>>
>>> I am getting similiar error when trying to get to the datanode01.
>>> Please note I used the following domain names for the following datanodes
>>> when I installed the CentOS
>>> datanode 10 --> localhost.datanode10
>>> datanode01 --> localhost.datanode01
>>>
>>>
>>>
>>>
>>>
>>> David Novogrodsky
>>> david.novogrodsky@gmail.com
>>> http://www.linkedin.com/in/davidnovogrodsky
>>>
>>> On Mon, Dec 15, 2014 at 11:50 AM, Yusaku Sako <yusaku@hortonworks.com>
>>> wrote:
>>>>
>>>> Did you change the FQDNs like I proposed, like namenode.localdomain,
>>>> rather than localhost.namenode?
>>>> Did you ensure that the 3 commands returned the results as shown?
>>>> Can each host resolve all the other hosts by name?
>>>>
>>>> If you want to get a cluster up and running on VMs, the best bet is to
>>>> use:
>>>> https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide
>>>>
>>>> This sets up all /etc/hosts and other settings in the way you want.
>>>> Then you can see how these VMs are being set up and mimic on your VMs
>>>> if you'd rather set them up from scratch.
>>>>
>>>> I hope this helps.
>>>> Yusaku
>>>>
>>>>
>>>> On Mon, Dec 15, 2014 at 8:18 AM, David Novogrodsky <
>>>> david.novogrodsky@gmail.com> wrote:
>>>>>
>>>>> Ok, I removed the multiple instances onf localhost.namenode.  It now
>>>>> only appears on one line in the hosts file.
>>>>>
>>>>> The main ambari server still cannot see the data nodes nor the node
>>>>> Ambari is on.  Ambari is on the namenode.  When I run the install, the
>>>>> install program can not connect to any node in the network.
>>>>>
>>>>> Also I tried running /etc/init.d/network restart on one of the nodes;
>>>>> datanode10 ( a virtual machine).  Now that node cannot connect to the
>>>>> internet....I would like to send you the information but I am having
>>>>> problems setting the document from the virtual machine.
>>>>>
>>>>> I do not have a DNS.  These machines have hardwired IP addresses and
>>>>> names in the host file. Did runn /etc/init.d/network restart break the
>>>>> connection?
>>>>>
>>>>>
>>>>> David Novogrodsky
>>>>> david.novogrodsky@gmail.com
>>>>> http://www.linkedin.com/in/davidnovogrodsky
>>>>>
>>>>> On Sat, Dec 13, 2014 at 12:46 AM, Yusaku Sako <yusaku@hortonworks.com>
>>>>> wrote:
>>>>>>
>>>>>> You can just make the changes in /etc/hosts.  You might also
>>>>>> change /etc/sysconfig/network and run /etc/init.d/network restart.
>>>>>>
>>>>>> Then make sure that running the 3 commands return expected results.
>>>>>>
>>>>>> Yusaku
>>>>>>
>>>>>> On Fri, Dec 12, 2014 at 9:06 PM, David Novogrodsky <
>>>>>> david.novogrodsky@gmail.com> wrote:
>>>>>>>
>>>>>>> ​When I installed the CentOS on the machines, I chose those
name,
>>>>>>> localhost.datanode01...and so on.  You mean I have to reinstall
CentOS on
>>>>>>> the machines again?
>>>>>>>
>>>>>>> Can I just make the changes in the host files?
>>>>>>>
>>>>>>> Will I need to recreate the SSH keys?.​
>>>>>>>
>>>>>>> David Novogrodsky
>>>>>>> david.novogrodsky@gmail.com
>>>>>>> http://www.linkedin.com/in/davidnovogrodsky
>>>>>>>
>>>>>>> On Fri, Dec 12, 2014 at 6:21 PM, Yusaku Sako <yusaku@hortonworks.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> I would set it up like this:
>>>>>>>>
>>>>>>>> 127.0.0.1 localhost localhost.localdomain localhost4
>>>>>>>> localhost4.localdomain4*   <- do not list the hostname
here. *
>>>>>>>> ::1 localhost localhost.localdomain localhost6
>>>>>>>> localhost6.localdomain6
>>>>>>>> xxx.xxx.200.144 datanode10.localdomain
>>>>>>>> xxx.xxx.200.107 datanode01.localdomain
>>>>>>>> xxx.xxx.200.143 namenode.localdomain namenode
>>>>>>>>
>>>>>>>> With this change:
>>>>>>>> * *hostname -f* should display *namenode.localdomain*
>>>>>>>> * *hostname* should display *namenode*
>>>>>>>> * *python -c 'import socket; print socket.getfqdn()' *should
>>>>>>>> display *namenode.localdomain*
>>>>>>>>
>>>>>>>> I hope this helps.
>>>>>>>> Yusaku
>>>>>>>>
>>>>>>>> On Fri, Dec 12, 2014 at 3:52 PM, David Novogrodsky <
>>>>>>>> david.novogrodsky@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> All,
>>>>>>>>>
>>>>>>>>> I am having a problem with Ambari.
>>>>>>>>> I am trying to use Ambari to install Hadoop to a three
node
>>>>>>>>> cluster. the name node is where the Ambari server is
located. I am getting
>>>>>>>>> this error:
>>>>>>>>> ERROR 2014-12-12 17:39:56,963 main.py:137 – Ambari
agent machine
>>>>>>>>> hostname (localhost.localdomain) does not match expected
ambari server
>>>>>>>>> hostname (namenode). Aborting registration. Please check
hostname, hostname
>>>>>>>>> -f and /etc/hosts file to confirm your hostname is setup
correctly
>>>>>>>>> ‘, None)
>>>>>>>>>
>>>>>>>>> Here is the contents of my hosts file:
>>>>>>>>> 127.0.0.1 localhost localhost.localdomain localhost4
>>>>>>>>> localhost4.localdomain4 localhost.namenode namenode
>>>>>>>>> ::1 localhost localhost.localdomain localhost6
>>>>>>>>> localhost6.localdomain6
>>>>>>>>> xxx.xxx.200.144 localhost.datanode10
>>>>>>>>> xxx.xxx.200.107 localhost.datanode01
>>>>>>>>> xxx.xxx.200.143 localhost.namenode namenode
>>>>>>>>>
>>>>>>>>> I am not sure what the problem is. Since there are only
four steps
>>>>>>>>> to run ambari there is not a lot of background to determine
the cause of
>>>>>>>>> this problem.
>>>>>>>>>
>>>>>>>>> David Novogrodsky
>>>>>>>>> david.novogrodsky@gmail.com
>>>>>>>>> http://www.linkedin.com/in/davidnovogrodsky
>>>>>>>>>
>>>>>>>>
>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>> NOTICE: This message is intended for the use of the individual
or
>>>>>>>> entity to which it is addressed and may contain information
that is
>>>>>>>> confidential, privileged and exempt from disclosure under
applicable law.
>>>>>>>> If the reader of this message is not the intended recipient,
you are hereby
>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>> disclosure or forwarding of this communication is strictly
prohibited. If
>>>>>>>> you have received this communication in error, please contact
the sender
>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>
>>>>>>>
>>>>>> CONFIDENTIALITY NOTICE
>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>> entity to which it is addressed and may contain information that
is
>>>>>> confidential, privileged and exempt from disclosure under applicable
law.
>>>>>> If the reader of this message is not the intended recipient, you
are hereby
>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>> disclosure or forwarding of this communication is strictly prohibited.
If
>>>>>> you have received this communication in error, please contact the
sender
>>>>>> immediately and delete it from your system. Thank You.
>>>>>>
>>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>>
>>>
>>
>> --
>> Devopam Mittra
>> Life and Relations are not binary
>>
>

-- 
Devopam Mittra
Life and Relations are not binary

Mime
View raw message