hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Demetrios Dimatos (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8828) Support distcp from secure to insecure clusters
Date Thu, 02 Feb 2017 02:13:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849295#comment-15849295

Demetrios Dimatos commented on HADOOP-8828:

I am wondering if there is any updates on this before I invest time digging deeper for a solution.
I to have seen this issue as described going from a Kerberized Secure Hadoop 2.7.3 Cluster
to an insecure Hadoop 2.7.3 cluster. My use case is below, let me know if there is something
more I can help with while I have the configuration available. 

hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://secure.com/tmp/tests/test.txt

The distcp command returns with an java.io.EOFException and the NN log has in it a "NoMatchingRule:
No rules applied to..." error.

> Support distcp from secure to insecure clusters
> -----------------------------------------------
>                 Key: HADOOP-8828
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8828
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools/distcp
>            Reporter: Eli Collins
>            Assignee: Haohui Mai
> Users currently can't distcp from secure to insecure clusters.
> Relevant background from ATM:
> There's no plumbing to make the HFTP client use AuthenticatedURL in the case security
is enabled. This means that even though you have the servlet filter correctly configured on
the server, the client doesn't know how to properly authenticate to that filter.
> The crux of the issue is that security is enabled globally instead of per-file system.
The trick of using HFTP as the source FS works when the source is insecure, but not the source
is secure.
> Normal cp with two hdfs:// URL can be made to work. There is indeed logic in o.a.h.ipc.Client
to fall back to using simple authentication if your client config has security enabled (hadoop.security.authentication
set to "kerberos") and the server responds with a response for simple authentication. Thing
is, there are at least 3 bugs with this that I bumped into. All three can be worked around.
> 1) If your client config has security enabled you *must* have a valid Kerberos TGT, even
if you're interacting with an insecure cluster. The hadoop client unfortunately tries to read
the local ticket cache before it tries to connect to the server, and so doesn't know that
it won't need Kerberos credentials.
> 2) Even though the destination NN is insecure, it has to have a Kerberos principal created
for it. You don't need a keytab, and you don't need to change any settings on the destination
NN. The principal just needs to exist in the principal database. This is again because the
hadoop client will, before connecting to the remote NN, try to get a service ticket for the
hdfs/f.q.d.n principal for the remote NN. If this fails, it won't even get to the part where
it tries to connect to the insecure NN and falls back to simple auth.
> 3) Once you get through problems 1 and 2, you will try to connect to the remote, insecure
NN. This will work, but the reported principal name of your user will include a realm that
the remote NN doesn't know about. You will either need to change the default_realm setting
in /etc/krb5.conf on the insecure NN to be the same as the secure NN, or you will need to
add some custom hadoop.security.auth_to_local mappings on the insecure NN so it knows how
to translate this long principal name into a short name.
> Even with all these changes, distcp still won't work since the first thing it tries to
do when submitting the job is to get a delegation token for all the involved NNs, which won't
work since the insecure NN isn't running a DT secret manager. I haven't been able to figure
out a way around this, except to make a custom distcp which doesn't necessarily do this.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message