hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benoy Antony <bant...@gmail.com>
Subject Re: Securing the Secondary Name Node
Date Wed, 18 Sep 2013 05:01:04 GMT
Chris ,

I think that the error occurs when NN tries to download the fsimage from
SNN.
You can check the NN logs to make sure whether this is true.

There could be different reasons for this.

1. NN fails to do SPNEGO with SNN.
2. NN's TGT expired. Unlikely in your test cluster.

Please post with any additional log info and I can help.

benoy




On Thu, Sep 12, 2013 at 6:02 AM, Christopher Penney <cpenney@gmail.com>wrote:

> Does anyone have any suggestions or resources I might look at to resolve
> this?  The documentation on setting up Kerberos seems pretty light.
>
>    Chris
>
>
>
> On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cpenney@gmail.com>wrote:
>
>>
>> Hi,
>>
>> After hosting an insecure Hadoop environment for early testing I'm
>> transitioning to something more secure that would (hopefully) more or less
>> mirror what a production environment might look like.  I've integrated our
>> Hadoop cluster into our Kerberos realm and everything is working ok except
>> for our secondary name node.  When I invoke the secondarynamenode with
>> "-checkpoint force" (when no other secondary name node process is running)
>> I get:
>>
>> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful
>> for user hdfs/hpctest3.realm.com@REALM.COM using keytab file
>> /etc/hadoop/hdfs.keytab
>> 13/09/10 09:44:25 INFO mortbay.log: Logging to
>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>> org.mortbay.log.Slf4jLog
>> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter
>> to getimage
>> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
>> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
>> the listener on 50090
>> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
>> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
>> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
>> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
>> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
>> hpctest3.realm.com@REALM.COM
>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
>> principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
>> [/etc/hadoop/hdfs.keytab]
>>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
>> configuration not set, using a random value as secret
>> 13/09/10 09:44:26 INFO mortbay.log: Started
>> SelectChannelConnector@0.0.0.0:50090
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server
>> up at: 0.0.0.0:50090
>>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
>> :3600 secs (60 min)
>> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>>  :67108864 bytes (65536 KB)
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?getimage=1
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file
>> fsimage size 110 bytes.
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?getedit=1
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
>> size 40 bytes.
>> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
>> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
>> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
>> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
>> hpctest3.realm.com@REALM.COM
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
>> dfs.block.invalidate.limit=100
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
>> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
>> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring
>> more than 10 times
>> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
>> 13/09/10 09:44:26 INFO common.Storage: Number of files under construction
>> = 0
>> 13/09/10 09:44:26 INFO common.Storage: Edits file
>> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
>> edits # 2 loaded in 0 seconds.
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
>> Total time for transactions(ms): 0Number of transactions batched in Syncs:
>> 0 Number of syncs: 0 SyncTimes(ms): 0
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
>> seconds.
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
>> hpctest3.realm.com:50070
>> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
>> trying to open authenticated connection to
>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>>
>> The name node log file has this in it:
>>
>> 2013-09-10 09:44:26,219 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
>> removed>
>> 2013-09-10 09:44:26,220 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
>> edits.new files already exists in all healthy directories:
>>   /tmp/hadoop/hdfs/name/current/edits.new
>> 2013-09-10 09:44:26,327 INFO
>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>> 2013-09-10 09:44:26,362 INFO
>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>>
>> I've tried googling this and looking in a few books, but I really don't
>> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
>> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>>
>> Thanks,
>>
>>    Chris
>>
>
>

Mime
View raw message