ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roberta Marton <roberta.mar...@esgyn.com>
Subject Unable to start NameNode after enabling Kerberos
Date Thu, 14 Apr 2016 00:24:35 GMT
I encountered and interesting problem and could not find anything written
up on it.



After I enabled Kerberos through Ambari, the name node failed to start
properly.

I see the following issue in the hdfs log until retries finally timeout:



. . .

2016-04-13 19:03:22,348 INFO  httpclient.HttpMethodDirector
(HttpMethodDirector.java:executeWithRetry(439)) - I/O exception
(java.net.ConnectException) caught when processing request: Connection
refused

2016-04-13 19:03:22,348 INFO  httpclient.HttpMethodDirector
(HttpMethodDirector.java:executeWithRetry(445)) - Retrying request

2016-04-13 19:03:22,348 INFO  httpclient.HttpMethodDirector
(HttpMethodDirector.java:executeWithRetry(439)) - I/O exception
(java.net.ConnectException) caught when processing request: Connection
refused

2016-04-13 19:03:22,348 INFO  httpclient.HttpMethodDirector
(HttpMethodDirector.java:executeWithRetry(445)) - Retrying request

2016-04-13 19:03:22,349 WARN  timeline.HadoopTimelineMetricsSink
(HadoopTimelineMetricsSink.java:putMetrics(214)) - Unable to send metrics
to collector by address:http://<hostname>:6188/ws/v1/timeline/metrics

2016-04-13 19:03:58,533 ERROR namenode.NameNode (LogAdapter.java:error(71))
- RECEIVED SIGNAL 15: SIGTERM

2016-04-13 19:03:58,542 INFO  namenode.NameNode (LogAdapter.java:info(47))
- SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at <hostname>/<IP>



I was able to perform Kerberos requests between <hostname> and the node
where the KDC resides

After waiting for all the retries to fail, the message returned is:



2016-04-13 19:04:03,724 -
File['/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'] {'action':
['delete'], 'not_if': 'ambari-sudo.sh  -H -E test -f
/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh  -H -E
pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'}

2016-04-13 19:04:03,730 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash
-c 'ulimit -c unlimited ;
/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config
/usr/hdp/current/hadoop-client/conf start namenode''] {'environment':
{'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if':
'ambari-sudo.sh  -H -E test -f
/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh  -H -E
pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'}

2016-04-13 19:04:07,826 - Execute['/usr/bin/kinit -kt
/etc/security/keytabs/hdfs.headless.keytab hdfs-<hostname>@<REALM>']
{'user': 'hdfs'}

2016-04-13 19:04:07,978 - Must wait to leave safemode since High
Availability is not enabled.

2016-04-13 19:04:07,978 - Checking the NameNode safemode status since may
need to transition from ON to OFF.

2016-04-13 19:04:07,979 - Execute['hdfs dfsadmin -fs
hdfs://<FQDNhostname>:8020 -safemode get | grep 'Safe mode is OFF'']
{'logoutput': True, 'tries': 180, 'user': 'hdfs', 'try_sleep': 10}

2016-04-13 19:04:11,185 - Retrying after 10 seconds. Reason: Execution of
'hdfs dfsadmin -fs hdfs://robertablue.novalocal:8020 -safemode get | grep
'Safe mode is OFF'' returned 1.

2016-04-13 19:04:24,289 - Retrying after 10 seconds. Reason: Execution of
'hdfs dfsadmin -fs hdfs://robertablue.novalocal:8020 -safemode get | grep
'Safe mode is OFF'' returned 1.

. . .



I ran the requests on <hostname>:



/usr/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab
hdfs-<hostname>@<REALM>

klist

Ticket cache: FILE:/tmp/krb5cc_0

Default principal: hdfs-<hostname>@<REALM>



Valid starting     Expires            Service principal

04/13/16 19:29:58  04/14/16 19:29:58  krbtgt/…

       renew until 04/20/16 19:29:58





hdfs dfsadmin -fs hdfs://<hostname>:8020 -safemode get

Safe mode is ON



I read that safe mode is turned on during this operation, so that is
expected. Not sure what causes it to be turned off.  After trying several
things, I just I restarted my namenode through Ambari and while it was
looping waiting for safe mode to be turned off, I turned it off in another
process.  This time namenode and the rest of the installed services
started; I was able to access the system.



I later restarted HDFS and several other services without incident through
Ambari.



    Regards,

    Roberta

Mime
View raw message