ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tuong Truong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AMBARI-11854) ambari-agent fails to start when node has multiple network cards with some does not have IP address
Date Thu, 11 Jun 2015 02:09:00 GMT
Tuong Truong created AMBARI-11854:
-------------------------------------

             Summary: ambari-agent fails to start when node has multiple network cards with
some does not have IP address
                 Key: AMBARI-11854
                 URL: https://issues.apache.org/jira/browse/AMBARI-11854
             Project: Ambari
          Issue Type: Bug
          Components: ambari-agent
    Affects Versions: 2.1.0.
         Environment: AMD
            Reporter: Tuong Truong
             Fix For: 2.1.0.


In a cluster with nodes that has multiple network interfaces..   Ambari-agent fails to start
due to one or more active network interface did not bind to an IP address.

The /var/log/ambari-agent/ambari-agent.out shows

Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 354, in run
    self.register = Register(self.config)
  File "/usr/lib/python2.6/site-packages/ambari_agent/Register.py", line 34, in __init__
    self.hardware = Hardware()
  File "/usr/lib/python2.6/site-packages/ambari_agent/Hardware.py", line 41, in __init__
    self.hardware.update(Facter().facterInfo())
  File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 466, in facterInfo
    facterInfo = super(FacterLinux, self).facterInfo()
  File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 161, in facterInfo
    facterInfo['netmask'] = self.getNetmask()
  File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 384, in getNetmask
    if primary_ip == self.get_ip_address_by_ifname(i.strip()).strip():
  File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 397, in get_ip_address_by_ifname
    struct.pack('256s', ifname[:15])
IOError: [Errno 99] Cannot assign requested address


Ran command manually on the nodes that failed to register  'python /usr/lib/python2.6/site-packages/ambari_agent/Facter.py'
 and got the same response.

When we ran it on nodes where the registration was successful I get a json response like

{'kernel': 'Linux', 'domain': 'svl.ibm.com', 'kernelrelease': '2.6.32-504.el6.x86_64', 'uptime_days':
'0', 'memorytotal': 49413988, 'swapfree': '8.00 GB', 'processorcount': 24, 'selinux': False,
'timezone': 'PST', 'hardwareisa': 'x86_64', 'operatingsystem': 'redhat', 'hostname': 'hdperf014',
'id': 'root', 'memoryfree': 48185456, 'hardwaremodel': 'x86_64', 'uptime_seconds': '11578',
'osfamily': 'redhat', 'memorysize': 49413988, 'interfaces': 'eth0,lo', 'physicalprocessorcount':
24, 'swapsize': '8.00 GB', 'netmask': '255.255.255.0', 'ipaddress': '9.30.75.23', 'kernelmajversion':
'2.6', 'kernelversion': '2.6.32', 'macaddress': '00:02:C9:4B:57:62', 'operatingsystemrelease':
'6.6', 'uptime_hours': '3', 'fqdn': 'hdperf014.svl.ibm.com', 'architecture': 'x86_64'}


rroot@xxxxx ambari-agent]# ifconfig
eth0      Link encap:Ethernet  HWaddr 5C:F3:FC:A6:48:B4
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:93360000-9337ffff

eth2      Link encap:Ethernet  HWaddr 00:02:C9:4B:57:CE
          inet addr:9.30.75.21  Bcast:9.30.75.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:48830 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25329 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:2000
          RX bytes:64833325 (61.8 MiB)  TX bytes:2582433 (2.4 MiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:21 errors:0 dropped:0 overruns:0 frame:0
          TX packets:21 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1560 (1.5 KiB)  TX bytes:1560 (1.5 KiB)

workaround is to deactivate the network interface: :ifconfig eth0 down

If config now sees

[root@hdperf012 ambari_agent]# ifconfig
eth2      Link encap:Ethernet  HWaddr 00:02:C9:4B:57:CE
          inet addr:9.30.75.21  Bcast:9.30.75.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:49006 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25420 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:2000
          RX bytes:64847473 (61.8 MiB)  TX bytes:2593953 (2.4 MiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:21 errors:0 dropped:0 overruns:0 frame:0
          TX packets:21 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1560 (1.5 KiB)  TX bytes:1560 (1.5 KiB)

ambari-agent comes up afterward.

Same machine did not hit the problem in prior Ambari build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message