ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Novogrodsky <david.novogrod...@gmail.com>
Subject Re: Problem with Ambari 1.7 recognizing hosts running CentOS 6
Date Thu, 18 Dec 2014 02:32:13 GMT
Please forgive me if I am sending this twice:

I am having a problem with Ambari not recognizing nodes on a network.
The cluster is using CentOS 6.  I am trying to install HDP 2.1.  I
have the following values in my hosts file:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.200.144 datanode10.localdomain.com
192.168.200.143 namenode.localdomain.com
192.168.200.107 datanode01.localdomain.com
When I try to connect from the namenode.localdomain.com to
datanode10.localdomain.com i get this error in the registration log:

==========================
Running setup agent script...
DJN...expected_host not defined here
DJN:bootstrap.py ...expected_host is: datanode10.localdomain.com
==========================
....
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
("WARNING 2014-12-17 16:22:50,380 NetUtil.py:92 - Server at
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
is not reachable, sleeping for 10 seconds...
INFO 2014-12-17 16:23:00,390 NetUtil.py:48 - Connecting to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
WARNING 2014-12-17 16:23:00,391 NetUtil.py:71 - Failed to connect to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
due to [Errno -2] Name or service not known
...
Connection to datanode10.localdomain.com closed.
SSH command execution finished
host=datanode10.localdomain.com, exitcode=0
Command end time 2014-12-17 16:23:26  datanode10.localdomain.com


What follows is more detail.

I also make some changes to the
/usr/lib/python2.6/site-packages/ambari_server/bootstrap.py file
  def run(self):
    sshcommand = ["ssh",
                  "-o", "ConnectTimeOut=60",
                  "-o", "StrictHostKeyChecking=no",
                  "-o", "BatchMode=yes",
                  "-tt", # Should prevent "tput: No value for $TERM
and no -T specified" warning
                  "-i", self.sshkey_file,
                  self.user + "@" + self.host, self.command]
    if DEBUG:
      self.host_log.write("Running ssh command " + ' '.join(sshcommand))
    self.host_log.write("==========================")
    self.host_log.write("\nCommand start time " +
datetime.now().strftime('%Y-%m-%d %H:%M:%S') + "  " + self.host + "  "
+ self.user + "  " + self.sshkey_file + "  " + self.command)
    #self.host_log.write("djn:BOOTSTRAP the value is:" + self.host)
    sshstat = subprocess.Popen(sshcommand, stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    log = sshstat.communicate()
    errorMsg = log[1]
    if self.errorMessage and sshstat.returncode != 0:
      errorMsg = self.errorMessage + "\n" + errorMsg
    log = log[0] + "\n" + errorMsg
    self.host_log.write(log)
    self.host_log.write("SSH command execution finished")
    self.host_log.write("host=" + self.host + ", exitcode=" +
str(sshstat.returncode))
    self.host_log.write("Command end time " +
datetime.now().strftime('%Y-%m-%d %H:%M:%S') + "  " + self.host)
    return  {"exitstatus": sshstat.returncode, "log": log, "errormsg": errorMsg}

I added some information on the host_log file.  The information
includes self.host, self.user, self.ssh key_file and so on...

When I run the web front end I get two different results.  First I
will detail the connection to the namenode.localdomain.com.  second I
will detail the connection to the datanode10.localdomain.com.

The connection to the namenode.localdomain.com is successful.  Here is
the important part of the registeration log:

==========================
Running setup agent script...
DJN...expected_host not defined here
DJN:bootstrap.py ...expected_host is: namenode.localdomain.com
==========================
Command start time 2014-12-17 16:23:17  namenode.localdomain.com  root
 /var/run/ambari-server/bootstrap/25/sshKey  sudo python
/var/lib/ambari-agent/data/tmp/setupAgent1418854996.py
namenode.localdomain.com DEV namenode.localdomain.com 1.7.0 8080
Verifying Python version compatibility...
Using python  /usr/bin/python2.6
Found ambari-agent PID: 32172
Stopping ambari-agent
Removing PID file at /var/run/ambari-agent/ambari-agent.pid
ambari-agent successfully stopped
Restarting ambari-agent
Verifying Python version compatibility...
Using python  /usr/bin/python2.6
ambari-agent is not running. No PID found at
/var/run/ambari-agent/ambari-agent.pid
Verifying Python version compatibility...
Using python  /usr/bin/python2.6
Checking for previously running Ambari Agent...
Starting ambari-agent
Verifying ambari-agent process status...
Ambari Agent successfully started
Agent PID at: /var/run/ambari-agent/ambari-agent.pid
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
('INFO 2014-12-17 16:22:56,352 Heartbeat.py:78 - Building Heartbeat:
{responseId = 17, timestamp = 1418854976352, commandsInProgress =
False, componentsMapped = False}
INFO 2014-12-17 16:22:56,407 Controller.py:214 - Heartbeat response
received (id = 18)
INFO 2014-12-17 16:22:56,408 Controller.py:249 - No commands sent from
namenode.localdomain.com
INFO 2014-12-17 16:23:06,409 Heartbeat.py:78 - Building Heartbeat:
{responseId = 18, timestamp = 1418854986409, commandsInProgress =
False, componentsMapped = False}
INFO 2014-12-17 16:23:13,422 HostCheckReportFileHandler.py:43 - Host
check report at /var/lib/ambari-agent/data/hostcheck.result
INFO 2014-12-17 16:23:13,423 HostCheckReportFileHandler.py:104 -
Removing old host check file at
/var/lib/ambari-agent/data/hostcheck.result
INFO 2014-12-17 16:23:13,423 HostCheckReportFileHandler.py:109 -
Creating host check file at
/var/lib/ambari-agent/data/hostcheck.result
INFO 2014-12-17 16:23:13,491 Controller.py:214 - Heartbeat response
received (id = 19)
INFO 2014-12-17 16:23:13,492 Controller.py:249 - No commands sent from
namenode.localdomain.com
INFO 2014-12-17 16:23:21,942 main.py:83 - loglevel=logging.INFO
INFO 2014-12-17 16:23:23,493 Heartbeat.py:78 - Building Heartbeat:
{responseId = 19, timestamp = 1418855003493, commandsInProgress =
False, componentsMapped = False}
INFO 2014-12-17 16:23:23,544 Controller.py:214 - Heartbeat response
received (id = 20)
INFO 2014-12-17 16:23:23,544 Controller.py:249 - No commands sent from
namenode.localdomain.com
INFO 2014-12-17 16:23:28,845 main.py:83 - loglevel=logging.INFO
INFO 2014-12-17 16:23:28,846 DataCleaner.py:36 - Data cleanup thread started
INFO 2014-12-17 16:23:28,847 DataCleaner.py:117 - Data cleanup started
INFO 2014-12-17 16:23:28,857 DataCleaner.py:119 - Data cleanup finished
INFO 2014-12-17 16:23:28,967 PingPortListener.py:51 - Ping port
listener started on port: 8670
INFO 2014-12-17 16:23:28,968 main.py:233 - Connecting to Ambari server
at https://namenode.localdomain.com:8440 (192.168.200.143)
INFO 2014-12-17 16:23:28,969 NetUtil.py:48 - Connecting to
https://namenode.localdomain.com:8440/ca
', None)

Connection to namenode.localdomain.com closed.
SSH command execution finished
host=namenode.localdomain.com, exitcode=0
Command end time 2014-12-17 16:23:31  namenode.localdomain.com

The connection to the datanode10.localdomain.com does not work.  Here
is the registeration log for that attempt:
==========================
Running setup agent script...
DJN...expected_host not defined here
DJN:bootstrap.py ...expected_host is: datanode10.localdomain.com
==========================

Command start time 2014-12-17 16:23:16  datanode10.localdomain.com
root  /var/run/ambari-server/bootstrap/25/sshKey  sudo python
/var/lib/ambari-agent/data/tmp/setupAgent1418854996.py
datanode10.localdomain.com DEV namenode.localdomain.com 1.7.0 8080
Verifying Python version compatibility...
Using python  /usr/bin/python2.6
Found ambari-agent PID: 7325
Stopping ambari-agent
Removing PID file at /var/run/ambari-agent/ambari-agent.pid
ambari-agent successfully stopped
Restarting ambari-agent
Verifying Python version compatibility...
Using python  /usr/bin/python2.6
ambari-agent is not running. No PID found at
/var/run/ambari-agent/ambari-agent.pid
Verifying Python version compatibility...
Using python  /usr/bin/python2.6
Checking for previously running Ambari Agent...
Starting ambari-agent
Verifying ambari-agent process status...
Ambari Agent successfully started
Agent PID at: /var/run/ambari-agent/ambari-agent.pid
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
("WARNING 2014-12-17 16:22:50,380 NetUtil.py:92 - Server at
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
is not reachable, sleeping for 10 seconds...
INFO 2014-12-17 16:23:00,390 NetUtil.py:48 - Connecting to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
WARNING 2014-12-17 16:23:00,391 NetUtil.py:71 - Failed to connect to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
due to [Errno -2] Name or service not known
WARNING 2014-12-17 16:23:00,391 NetUtil.py:92 - Server at
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
is not reachable, sleeping for 10 seconds...
INFO 2014-12-17 16:23:10,402 NetUtil.py:48 - Connecting to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
WARNING 2014-12-17 16:23:10,402 NetUtil.py:71 - Failed to connect to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
due to [Errno -2] Name or service not known
WARNING 2014-12-17 16:23:10,402 NetUtil.py:92 - Server at
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
is not reachable, sleeping for 10 seconds...
INFO 2014-12-17 16:23:17,959 main.py:83 - loglevel=logging.INFO
INFO 2014-12-17 16:23:17,959 main.py:55 - signal received, exiting.
INFO 2014-12-17 16:23:17,960 ProcessHelper.py:39 - Removing pid file
INFO 2014-12-17 16:23:17,960 ProcessHelper.py:46 - Removing temp files
INFO 2014-12-17 16:23:23,639 main.py:83 - loglevel=logging.INFO
INFO 2014-12-17 16:23:23,639 DataCleaner.py:36 - Data cleanup thread started
INFO 2014-12-17 16:23:23,641 DataCleaner.py:117 - Data cleanup started
INFO 2014-12-17 16:23:23,642 DataCleaner.py:119 - Data cleanup finished
INFO 2014-12-17 16:23:23,678 PingPortListener.py:51 - Ping port
listener started on port: 8670
WARNING 2014-12-17 16:23:23,678 main.py:235 - Unable to determine the
IP address of the Ambari server
'namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode'
INFO 2014-12-17 16:23:23,678 NetUtil.py:48 - Connecting to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
WARNING 2014-12-17 16:23:23,679 NetUtil.py:71 - Failed to connect to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
due to [Errno -2] Name or service not known
WARNING 2014-12-17 16:23:23,679 NetUtil.py:92 - Server at
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
is not reachable, sleeping for 10 seconds...
", None)

Connection to datanode10.localdomain.com closed.
SSH command execution finished
host=datanode10.localdomain.com, exitcode=0
Command end time 2014-12-17 16:23:26  datanode10.localdomain.com

Registering with the server...
Registration with the server failed.
===============================


To double check something I wrote the following command using the
sshcommand in the bootstrap.py script:
[root@namenode ~]# ssh -v -o ConnectTimeOut=60 -o
StrictHostKeyChecking=no -o BatchMode=yes -tt -i /root/Desktop/id_rsa
root@datanode10.localdomain.com "[ -d /var/lib/ambari-agent/data/tmp ]
|| sudo mkdir -p /var/lib/ambari-agent/data/tmp ; sudo chown root
/var/lib/ambari-agent/data/tmp"

The command worked and exited with a code of 0.  More detail follows.


I added the -v option and the path to the id_rsa key file is the same
one that I entered into the first page of the wizard.  The result is
as follows:
[root@namenode ~]# ssh -v -o ConnectTimeOut=60 -o
StrictHostKeyChecking=no -o BatchMode=yes -tt -i /root/Desktop/id_rsa
root@datanode10.localdomain.com "[ -d /var/lib/ambari-agent/data/tmp ]
|| sudo mkdir -p /var/lib/ambari-agent/data/tmp ; sudo chown root
/var/lib/ambari-agent/data/tmp"
OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Connecting to datanode10.localdomain.com [192.168.200.144] port 22.
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: identity file /root/Desktop/id_rsa type 1
debug1: identity file /root/Desktop/id_rsa-cert type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3
debug1: match: OpenSSH_5.3 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.3
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-md5 none
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
debug1: Host 'datanode10.localdomain.com' is known and matches the RSA host key.
debug1: Found key in /root/.ssh/known_hosts:13
debug1: ssh_rsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue:
publickey,gssapi-keyex,gssapi-with-mic,password
debug1: Next authentication method: gssapi-keyex
debug1: No valid Key exchange context
debug1: Next authentication method: gssapi-with-mic
debug1: Unspecified GSS failure.  Minor code may provide more information
Credentials cache file '/tmp/krb5cc_0' not found

debug1: Unspecified GSS failure.  Minor code may provide more information
Credentials cache file '/tmp/krb5cc_0' not found

debug1: Unspecified GSS failure.  Minor code may provide more information


debug1: Unspecified GSS failure.  Minor code may provide more information
Credentials cache file '/tmp/krb5cc_0' not found

debug1: Next authentication method: publickey
debug1: Offering public key: /root/Desktop/id_rsa
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug1: Authentication succeeded (publickey).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: Sending environment.
debug1: Sending env XMODIFIERS = @im=none
debug1: Sending env LANG = en_US.UTF-8
debug1: Sending command: [ -d /var/lib/ambari-agent/data/tmp ] || sudo
mkdir -p /var/lib/ambari-agent/data/tmp ; sudo chown root
/var/lib/ambari-agent/data/tmp
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
debug1: channel 0: free: client-session, nchannels 1
Connection to datanode10.localdomain.com closed.
Transferred: sent 2952, received 2352 bytes, in 0.0 seconds
Bytes per second: sent 106095.7, received 84531.6
debug1: Exit status 0
David Novogrodsky
david.novogrodsky@gmail.com
http://www.linkedin.com/in/davidnovogrodsky


On Wed, Dec 17, 2014 at 6:14 PM, Jeff Sposetti <jeff@hortonworks.com> wrote:
> Hi David, Try sending in plain/text, not HTML.
>
>
> On Wed, Dec 17, 2014 at 7:10 PM, David Novogrodsky
> <david.novogrodsky@gmail.com> wrote:
>>
>> I am having problems adding mor
>> information to this post:
>> Delivery to the following recipient failed permanently:
>>
>>      user@ambari.apache.org
>>
>> Technical details of permanent failure:
>> Google tried to deliver your message, but it was rejected by the server
>> for the recipient domain ambari.apache.org by
>> mx1.eu.apache.org.[192.87.106.230].
>>
>> The error that the other server returned was:
>> 552 spam score (6.3) exceeded threshold
>> (HTML_MESSAGE,LONGWORDS,RCVD_IN_DNSWL_LOW,SPF_PASS,SPOOF_COM2OTH,WEIRD_PORT
>>
>> David Novogrodsky
>> david.novogrodsky@gmail.com
>> http://www.linkedin.com/in/davidnovogrodsky
>>
>> On Wed, Dec 17, 2014 at 1:12 PM, David Novogrodsky
>> <david.novogrodsky@gmail.com> wrote:
>>>
>>> The error from the registration log is as follows:
>>> ==========================
>>> Running setup agent script...
>>> ==========================
>>> Agent log at: /var/log/ambari-agent/ambari-
>>> agent.log
>>> ("WARNING 2014-12-17 10:43:08,349 NetUtil.py:92 - Server at
>>> https://namenode .
>>> localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
>>> is not reachable, sleeping for 10 seconds...
>>>
>>> David Novogrodsky
>>> david.novogrodsky@gmail.com
>>> http://www.linkedin.com/in/davidnovogrodsky
>>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.

Mime
View raw message