hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei-Chiu Chuang <weic...@cloudera.com>
Subject Re: Secure Hadoop - invalid Kerberos principal errors
Date Thu, 20 Oct 2016 17:41:25 GMT
Instead of specifying host name of server principal,
have you tried to use hdfs/_HOST@TNBSOUND.COM?

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html#Kerberos_principals_for_Hadoop_Daemons
<http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html#Kerberos_principals_for_Hadoop_Daemons>

> dfs.journalnode.kerberos.principal</name>
>         <value>hdfs/aw1hdnn001.tnbsound.com@TNBSOUND.COM</value>

Wei-Chiu Chuang
A very happy Clouderan

> On Oct 20, 2016, at 10:19 AM, Mark Selby <mselby@pandora.com> wrote:
> 
> We have an existing CDH 5.5.1 cluster with simple authentication and no authorization.
We are building out a new cluster and plan to move to CDH 5.8.2 wiith Kerberos based authentication.
We have an existing MIT Kerberos infrastructure which we sucessfully use for a variety of
services. (ssh,apache,postfix)
> 
> I am very confident that out /etc/krb5.conf and name resolution is working. I have even
used HadoopDNSVerifier-1.0.jar to verify that java sees the same name canonicalization that
we see.
> 
> I have built and test cluster and closely followed the instructions on the secure hadoop
install doc from the clodera site making sure that all the conf files are properly edited
and all the Kerberos keytabs contain the correct principals and have the correct permissions.
> 
> We are using HA namenodes with Quorm based journalmanagers
> 
> I am running into a persistent problem with many hadoop compents when they need to talk
securely to remote servers. The two example that I post here are the namenode needing to talk
to remote journalnodes and command line hdfs client needing to speak to a remote namenode.
Both give the same error
> 
> Server has invalid Kerberos principal: hdfs/aw1hdnn002.tnbsound.com@TNBSOUND.COM; Host
Details : local host is: "aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: "aw1hdnn002.tnbsound.com":8020;
> 
> There is not much on the inter-webs about this and the error that is showing up is leading
me to belive that the issue is aroung the kerberos realm being used in one place and not the
other.
> 
> I just can not seem to figure out what is going on here as I know these are vaild principals.
I have added a snippet at the end where I have enabled kerberos debugging to see if that helps
at all
> 
> The weird part is that this error applies only to remote daemons. The local namenode
and journal node does not have the issue. We can “speak” locally but not remotely.
> 
> All and Any help is greatly appreciated
> 
> #
> # This is me with hdfs kerberos credentials trying to run hdfs dfsadmin -refreshServiceAcl
> #
> 
> hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 53$ klist
> Ticket cache: FILE:/tmp/krb5cc_115
> Default principal: hdfs/aw1hdnn001.tnbsound.com@TNBSOUND.COM
> Valid starting Expires Service principal
> 10/20/2016 15:34:49 10/21/2016 15:34:49 krbtgt/TNBSOUND.COM@TNBSOUND.COM
> renew until 10/27/2016 15:34:49
> 
> hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 54$ hdfs dfsadmin -refreshServiceAcl
> Refresh service acl successful for aw1hdnn001.tnbsound.com/10.132.8.19:8020
> refreshServiceAcl: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException:
Server has invalid Kerberos principal: hdfs/aw1hdnn002.tnbsound.com@TNBSOUND.COM; Host Details
: local host is: "aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: "aw1hdnn002.tnbsound.com":8020;
> 
> #
> # This is the namenode trying to start up and contant and off server jornalnode
> #
> 2016-10-20 16:51:40,703 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hdfs/aw1hdnn001.tnbsound.com@TNBSOUND.COM (auth:KERBEROS) cause:java.io.IOException: java.lang.IllegalArgumentException:
Server has invalid Kerberos principal: hdfs/aw1hdrm001.tnbsound.com@TNBSOUND.COM
> 10.132.8.21:8485: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException:
Server has invalid Kerberos principal: hdfs/aw1hdrm001.tnbsound.com@TNBSOUND.COM; Host Details
: local host is: "aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: "aw1hdrm001.tnbsound.com":8485;

> 
> #
> # This is me with hdfs kerberos credentials trying to run hdfs dfsadmin -refreshServiceAcl
with debug into
> #
> hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 46$ HADOOP_OPTS="-Dsun.security.krb5.debug=true"
hdfs dfsadmin -refreshServiceAcl
> Java config name: null
> Native config name: /etc/krb5.conf
> Loaded from native config
> >>>KinitOptions cache name is /tmp/krb5cc_115
> >>>DEBUG <CCacheInputStream> client principal is hdfs/aw1hdnn001.tnbsound.com@TNBSOUND.COM
> >>>DEBUG <CCacheInputStream> server principal is krbtgt/TNBSOUND.COM@TNBSOUND.COM
> >>>DEBUG <CCacheInputStream> key type: 18
> >>>DEBUG <CCacheInputStream> auth time: Thu Oct 20 16:55:42 UTC 2016
> >>>DEBUG <CCacheInputStream> start time: Thu Oct 20 16:55:42 UTC 2016
> >>>DEBUG <CCacheInputStream> end time: Fri Oct 21 16:55:42 UTC 2016
> >>>DEBUG <CCacheInputStream> renew_till time: Thu Oct 27 16:55:42 UTC
2016
> >>> CCacheInputStream: readFlags() FORWARDABLE; PROXIABLE; RENEWABLE; INITIAL;
PRE_AUTH;
> >>>DEBUG <CCacheInputStream> client principal is hdfs/aw1hdnn001.tnbsound.com@TNBSOUND.COM
> >>>DEBUG <CCacheInputStream> server principal is X-CACHECONF:/krb5_ccache_conf_data/fast_avail/krbtgt/TNBSOUND.COM@TNBSOUND.COM
> >>>DEBUG <CCacheInputStream> key type: 0
> >>>DEBUG <CCacheInputStream> auth time: Thu Jan 01 00:00:00 UTC 1970
> >>>DEBUG <CCacheInputStream> start time: null
> >>>DEBUG <CCacheInputStream> end time: Thu Jan 01 00:00:00 UTC 1970
> >>>DEBUG <CCacheInputStream> renew_till time: null
> >>> CCacheInputStream: readFlags() 
> >>>DEBUG <CCacheInputStream> client principal is hdfs/aw1hdnn001.tnbsound.com@TNBSOUND.COM
> >>>DEBUG <CCacheInputStream> server principal is X-CACHECONF:/krb5_ccache_conf_data/pa_type/krbtgt/TNBSOUND.COM@TNBSOUND.COM
> >>>DEBUG <CCacheInputStream> key type: 0
> >>>DEBUG <CCacheInputStream> auth time: Thu Jan 01 00:00:00 UTC 1970
> >>>DEBUG <CCacheInputStream> start time: null
> >>>DEBUG <CCacheInputStream> end time: Thu Jan 01 00:00:00 UTC 1970
> >>>DEBUG <CCacheInputStream> renew_till time: null
> >>> CCacheInputStream: readFlags() 
> Found ticket for hdfs/aw1hdnn001.tnbsound.com@TNBSOUND.COM to go to krbtgt/TNBSOUND.COM@TNBSOUND.COM
expiring on Fri Oct 21 16:55:42 UTC 2016
> Entered Krb5Context.initSecContext with state=STATE_NEW
> Found ticket for hdfs/aw1hdnn001.tnbsound.com@TNBSOUND.COM to go to krbtgt/TNBSOUND.COM@TNBSOUND.COM
expiring on Fri Oct 21 16:55:42 UTC 2016
> Service ticket not found in the subject
> >>> Credentials acquireServiceCreds: same realm
> Using builtin default etypes for default_tgs_enctypes
> default etypes for default_tgs_enctypes: 18 17 16 23 1 3.
> >>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
> >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
> >>> KdcAccessibility: reset
> >>> KrbKdcReq send: kdc=dc1util003.tnbsound.com UDP:88, timeout=30000, number
of retries =3, #bytes=734
> >>> KDCCommunication: kdc=dc1util003.tnbsound.com UDP:88, timeout=30000,Attempt
=1, #bytes=734
> >>> KrbKdcReq send: #bytes read=721
> >>> KdcAccessibility: remove dc1util003.tnbsound.com
> >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
> >>> KrbApReq: APOptions are 00100000 00000000 00000000 00000000
> >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
> Krb5Context setting mySeqNumber to: 561537595
> Created InitSecContextToken:
> 0000: 01 00 6E 82 02 7F 30 82 02 7B A0 03 02 01 05 A1 ..n...0.........
> 0010: 03 02 01 0E A2 07 03 05 00 20 00 00 00 A3 82 01 ......... ......
> 0020: 7A 61 82 01 76 30 82 01 72 A0 03 02 01 05 A1 0E za..v0..r.......
> 0030: 1B 0C 54 4E 42 53 4F 55 4E 44 2E 43 4F 4D A2 2A ..TNBSOUND.COM.*
> 0040: 30 28 A0 03 02 01 00 A1 21 30 1F 1B 04 68 64 66 0(......!0...hdf
> 0050: 73 1B 17 61 77 31 68 64 6E 6E 30 30 31 2E 74 6E s..aw1hdnn001.tn
> 0060: 62 73 6F 75 6E 64 2E 63 6F 6D A3 82 01 2D 30 82 bsound.com...-0.
> 0070: 01 29 A0 03 02 01 12 A1 03 02 01 01 A2 82 01 1B .)..............
> 0080: 04 82 01 17 04 6E 26 46 08 EA 9C 61 08 80 B8 4B .....n&F...a...K
> 0090: AF 7C D2 CD 5E 47 19 3D A1 FB CD 8D 41 F4 C9 49 ....^G.=....A..I
> 00A0: 09 95 1C C7 9A D8 1B 92 0F 3C E0 5F 41 BF 99 96 .........<._A...
> 00B0: 42 A9 2D 17 D6 F0 AB 41 72 3E 7E F7 13 33 E2 0A B.-....Ar>...3..
> 00C0: 2D F5 71 AD 97 9A 9D 7F E0 EA 1A 29 7C D4 47 AB -.q........)..G.
> 00D0: B4 7E C1 A1 C5 28 DD 46 F1 C4 17 0B FC DB C9 D3 .....(.F........
> 00E0: F4 4D C2 1F 6C 59 A6 C4 9E 9D FD 56 E3 B0 31 E6 .M..lY.....V..1.
> 00F0: C6 6E 50 44 2C 07 44 91 40 F7 C8 6E AD 1E FB 26 .nPD,.D.@..n...&
> 0100: EC 6D E4 ED BC F8 15 17 0B 31 B6 4B 68 64 03 E4 .m.......1.Khd..
> 0110: 28 9B A5 9D AE 2A DF 1B BD 0F B2 AE B3 BB E0 4D (....*.........M
> 0120: 14 D1 9C E0 AC 99 59 1B B6 28 22 E2 B5 55 52 58 ......Y..("..URX
> 0130: D2 61 39 DE 8F C8 3F E6 6F EB 41 5D E1 F2 43 40 .a9...?.o.A]..C@
> 0140: 8F AC 78 C8 09 35 7B BA 39 6B CD C6 01 7B 90 0B ..x..5..9k......
> 0150: 20 0C 49 0D 8B E5 2B F1 E6 6F 38 4E EA DF 5C A9 .I...+..o8N..\.
> 0160: 40 AE 11 75 AE B2 E2 35 13 A8 CE CF E7 F5 92 CB @..u...5........
> 0170: A5 66 53 47 92 5A EF 31 CD 60 CD 67 46 D0 B7 0D .fSG.Z.1.`.gF...
> 0180: B6 76 FE 09 B1 03 16 FE B8 57 6E 08 9A E6 DD F8 .v.......Wn.....
> 0190: D3 AA 00 54 6C D4 70 61 95 08 CF A4 81 E7 30 81 ...Tl.pa......0.
> 01A0: E4 A0 03 02 01 12 A2 81 DC 04 81 D9 4E 48 9E 35 ............NH.5
> 01B0: 57 7C 7C 54 1C 9F 41 FE F3 C0 94 07 E2 D8 EE 38 W..T..A........8
> 01C0: BA 4A DA 97 43 04 B5 96 F6 A9 34 FD 54 FF 7B 96 .J..C.....4.T...
> 01D0: DA DD A9 6F C4 7B A5 E4 50 9F 9E 1A 62 D3 F3 3C ...o....P...b..<
> 01E0: 50 50 E9 02 05 F2 37 52 4D BC 86 D8 2B A4 9F FE PP....7RM...+...
> 01F0: 97 4C 01 7F E6 B4 8B 66 1F 6E 63 FD 3F EF 57 E9 .L.....f.nc.?.W.
> 0200: 04 E9 BE 28 4C 03 BC 26 EB EF EC DC 8C 48 C0 51 ...(L..&.....H.Q
> 0210: 7B 2B 5B 0F 16 7C 83 D0 73 F9 2A 94 CF 67 F2 F8 .+[.....s.*..g..
> 0220: 11 CC 2B E9 0D FE 95 F5 7E 2B C4 40 19 FE FE 6F ..+......+.@...o
> 0230: B7 C4 B8 7E 87 D1 0A 98 8A F2 B0 1A DF FA 27 24 ..............'$
> 0240: C2 EE 06 FE 3F 36 57 3D 6C B9 F3 18 98 19 D6 A1 ....?6W=l.......
> 0250: F4 49 57 5D 58 6E 88 C9 2E 1F FA 7D 53 24 B9 67 .IW]Xn......S$.g
> 0260: 02 85 C2 2C 01 25 18 BA BF 0E 64 A2 C3 06 7D AC ...,.%....d.....
> 0270: D6 11 A6 F4 ED 47 71 22 CC D4 E8 54 08 17 51 E6 .....Gq"...T..Q.
> 0280: EE 6F FE 31 37 .o.17
> Entered Krb5Context.initSecContext with state=STATE_IN_PROCESS
> >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
> Krb5Context setting peerSeqNumber to: 374605590
> Krb5Context.unwrap: token=[05 04 01 ff 00 0c 00 00 00 00 00 00 16 54 07 16 01 01 00 00
c5 67 32 c5 74 d0 68 ef 82 46 a8 85 ]
> Krb5Context.unwrap: data=[01 01 00 00 ]
> Krb5Context.wrap: data=[01 01 00 00 ]
> Krb5Context.wrap: token=[05 04 00 ff 00 0c 00 00 00 00 00 00 21 78 62 3b 01 01 00 00
a1 51 c9 92 95 bd cd 88 66 59 b7 49 ]
> Refresh service acl successful for aw1hdnn001.tnbsound.com/10.132.8.19:8020
> refreshServiceAcl: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException:
Server has invalid Kerberos principal: hdfs/aw1hdnn002.tnbsound.com@TNBSOUND.COM; Host Details
: local host is: "aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: "aw1hdnn002.tnbsound.com":8020;
> 
> #
> # hdfs-site.xml
> #
> 
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> <configuration>
> 
>     <!--               -->
>     <!-- HDFS security -->
>     <!--               -->
> 
>     <property>
>         <name>dfs.block.access.token.enable</name>
>         <value>true</value>
>     </property>
> 
>     <!--              -->
>     <!-- HA namespace -->
>     <!--              -->
> 
>     <property>
>         <name>dfs.nameservices</name>
>         <value>nbs-aw1-test</value>
>     </property>
> 
>     <!--              -->
>     <!-- HA namenodes -->
>     <!--              -->
> 
>     <property>
>         <name>dfs.ha.namenodes.nbs-aw1-test</name>
>         <value>nn1,nn2</value>
>     </property>
> 
>     <property>
>         <name>dfs.namenode.rpc-address.nbs-aw1-test.nn1</name>
>         <value>aw1hdnn001.tnbsound.com:8020</value>
>     </property>
> 
>     <property>
>         <name>dfs.namenode.http-address.nbs-aw1-test.nn1</name>
>         <value>aw1hdnn001.tnbsound.com:50070</value>
>     </property>
> 
>     <property>
>         <name>dfs.namenode.rpc-address.nbs-aw1-test.nn2</name>
>         <value>aw1hdnn002.tnbsound.com:8020</value>
>     </property>
> 
>     <property>
>         <name>dfs.namenode.http-address.nbs-aw1-test.nn2</name>
>         <value>aw1hdnn002.tnbsound.com:50070</value>
>     </property>
> 
>     <!--              -->
>     <!-- FS image dir -->
>     <!--              -->
> 
>     <property>
>         <name>dfs.namenode.name.dir</name>
>         <value>/var/lib/hadoop-hdfs/dfs/name</value>
>     </property>
> 
>     <!--            -->
>     <!-- QJM config -->
>     <!--            -->
> 
>     <property>
>         <name>dfs.namenode.shared.edits.dir</name>
>         <value>qjournal://aw1hdnn001.tnbsound.com:8485;aw1hdnn002.tnbsound.com:8485;aw1hdrm001.tnbsound.com:8485/nbs-aw1-test</value>
>     </property>
> 
>     <property>
>         <name>dfs.journalnode.edits.dir</name>
>         <value>/var/lib/hadoop-hdfs/dfs/journal</value>
>     </property>
> 
>     <!--                      -->
>     <!-- JournalNode security -->
>     <!--                      -->
> 
>     <property>
>         <name>dfs.journalnode.keytab.file</name>
>         <value>/etc/krb5/hdfs.keytab</value>
>     </property>
> 
>     <property>
>         <name>dfs.journalnode.kerberos.principal</name>
>         <value>hdfs/aw1hdnn001.tnbsound.com@TNBSOUND.COM</value>
>     </property>
> 
>     <property>
>         <name>dfs.journalnode.kerberos.internal.spnego.principal</name>
>         <value>HTTP/aw1hdnn001.tnbsound.com@TNBSOUND.COM</value>
>     </property>
> 
>     <!--                   -->
>     <!-- Namenode failover -->
>     <!--                   -->
> 
>     <property>
>         <name>dfs.client.failover.proxy.provider.nbs-aw1-test</name>
>         <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>     </property>
> 
>     <property>
>         <name>dfs.ha.automatic-failover.enabled</name>
>         <value>true</value>
>     </property>
> 
>     <property>
>         <name>dfs.ha.fencing.methods</name>
>         <value>sshfence
>         shell(/bin/true)</value>
>     </property>
> 
>     <property>
>         <name>dfs.ha.fencing.ssh.private-key-files</name>
>         <value>/var/lib/hadoop-hdfs/.ssh/id_rsa</value>
>     </property>
> 
>     <property>
>         <name>dfs.ha.fencing.ssh.connect-timeout</name>
>         <value>3000</value>
>     </property>
> 
>     <property>
>         <name>ha.zookeeper.quorum</name>
>         <value>aw1zook001.tnbsound.com:2181,aw1zook002.tnbsound.com:2181,aw1zook003.tnbsound.com:2181</value>
>     </property>
> 
>     <!--                      -->
>     <!-- NameNode security -->
>     <!--                      -->
> 
>     <property>
>         <name>dfs.namenode.keytab.file</name>
>         <value>/etc/krb5/hdfs.keytab</value>
>     </property>
> 
>     <property>
>         <name>dfs.namenode.kerberos.principal</name>
>         <value>hdfs/aw1hdnn001.tnbsound.com@TNBSOUND.COM</value>
>     </property>
> 
>     <property>
>         <name>dfs.namenode.kerberos.internal.spnego.principal</name>
>         <value>HTTP/aw1hdnn001.tnbsound.com@TNBSOUND.COM</value>
>     </property>
> 
>     <!--          -->
>     <!-- Datanode -->
>     <!--          -->
> 
>     <property>
>         <name>dfs.datanode.data.dir</name>
>         <value>/data01/hadoop-hdfs/dfs/data,/data02/hadoop-hdfs/dfs/data,/data03/hadoop-hdfs/dfs/data,/data04/hadoop-hdfs/dfs/data</value>
>     </property>
> 
>     <property>
>         <name>dfs.datanode.failed.volumes.tolerated</name>
>         <value>0</value>
>     </property>
> 
>     <property>
>         <name>dfs.datanode.fsdataset.volume.choosing.policy</name>
>         <value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
>     </property>
> 
>     <property>
>         <name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name>
>         <value>107374182400</value>
>     </property>
> 
>     <property>
>         <name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
>         <value>0.75</value>
>     </property>
> 
>     <!--                   -->
>     <!-- DataNode security -->
>     <!--                   -->
> 
>     <property>
>         <name>dfs.datanode.data.dir.perm</name>
>         <value>700</value>
>     </property>
> 
>     <property>
>         <name>dfs.datanode.keytab.file</name>
>         <value>/etc/krb5/hdfs.keytab</value>
>     </property>
> 
>     <property>
>         <name>dfs.datanode.kerberos.principal</name>
>         <value>hdfs/aw1hdnn001.tnbsound.com@TNBSOUND.COM</value>
>     </property>
> 
>     <property>
>         <name>dfs.datanode.address</name>
>         <value>0.0.0.0:1004</value>
>     </property>
>  
>     <property>
>         <name>dfs.datanode.http.address</name>
>         <value>0.0.0.0:1006</value>
>     </property>
> 
>     <!--      -->
>     <!-- Misc -->
>     <!--      -->
> 
>     <property>
>         <name>dfs.replication</name>
>         <value>3</value>
>     </property>
> 
>     <property>
>         <name>dfs.permissions.superusergroup</name>
>         <value>hadoop</value>
>     </property>
> 
>     <property>
>         <name>dfs.webhdfs.enabled</name>
>         <value>true</value>
>     </property>
> 
>     <property>
>         <name>dfs.hosts.exclude</name>
>         <value>/etc/hadoop/conf/hosts.exclude</value>
>         <final>true</final>
>     </property>
> 
>     <!--
>     From O'Reilly Hadoop Operations: A general guideline for setting
>     dfs.namenode.handler.count is to make it the natural logarithm of
>     the number of cluster nodes times 20 (as a whole number).  python -c
>     'import math ; print int(math.log(num_of_nodes) * 20)'
>     -->
>     <property>
>         <name>dfs.namenode.handler.count</name>
>         <value>24</value>
>     </property>
> 
>     <!--              -->
>     <!-- Web security -->
>     <!--              -->
> 
>     <property>
>         <name>dfs.web.authentication.kerberos.keytab</name>
>         <value>/etc/krb5/hdfs.keytab</value>
>     </property>
> 
>     <property>
>         <name>dfs.web.authentication.kerberos.principal</name>
>         <value>HTTP/aw1hdnn001.tnbsound.com@TNBSOUND.COM</value>
>     </property>
> 
>     <property>
>         <name>dfs.http.policy</name>
>         <value>HTTP_ONLY</value>
>     </property>
> 
> </configuration>
> 


Mime
View raw message