hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14556) S3A to support Delegation Tokens
Date Fri, 05 Oct 2018 12:42:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639773#comment-16639773
] 

Steve Loughran commented on HADOOP-14556:
-----------------------------------------

Patch 010. Contains a failed attempt to do a full e2e MR job with secure tokens.

I made the mistake of trying to get a clone of the (Working) {{ITestS3AMiniYarnCluster}} test
to now fetch delegation tokens, where the following sequence of problems were encountered,
in roughly this order. 

# YARN doesn't fetch DTs unless security is enabled. Action: enable security.
# MR reduce doesn't work with the local FS as the cluster FS when security is enabled, because
LocalFetcher demands the native libs at this point for secure Posix operations, and test suites
can't depend on that. Action: switch to HDFS
# HDFS is a PITA to bring up securely, Action, create new class {{MiniKerberizeHadoopCluster}}
tries to encapsulate the actions and config changes from some HDFS tests to bring up a secure
HDFS cluster.
# YARN is a PITA to bring up securely, and none of the YARN tests seem to do a full secure
MiniKDC + HDFS + YARN. Action. Try to get {{MiniKerberizeHadoopCluster}} to do this, too,
with a bit more guesswork on setttings
# Job submit fails as the NM can't auth with the DN, even though the junit thread can. DN
keeps breaking connections. Possibly some localhost/external IP address problem, which I can't
debug because I can't turn the firewall off. Action: give up on secure HDFS
# Switch to trying to use S3A itself as the cluster FS; lifting bits of code from (insecure)
MR atop HDFS test cases to get the relevant MR JAR up there.
# Test case fails {{testWordCount}} fails to submit job (server-side SASL problems where principal
doesn't have a full user/name@realm structure ("Empty nameString not allowed").
# That problem goes away if you run all three tests in the test suite (word count, FS access
and probe for the RM port being open). Action: run all three tests from the IDE. (note, they
set up individual YARN clusters each)
# Then you can submit work. But it doesn't run. And there's no logs other than this:

{code}
[2018-10-05 13:34:13.217]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Oct 05, 2018 1:34:00 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider
class
Oct 05, 2018 1:34:00 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Oct 05, 2018 1:34:00 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource
class
Oct 05, 2018 1:34:00 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25 AM'
Oct 05, 2018 1:34:00 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider
with the scope "Singleton"
Oct 05, 2018 1:34:00 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider
with the scope "Singleton"
Oct 05, 2018 1:34:00 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider
with the scope "PerRequest"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
{code}

I've given up on the test at this point. I'm submitting the patch, and I'm going to keep it
but set to skip, "broken", always for now. OR just tag & delete in my git repo. 

* I now understand why there aren't any end-to-end tests of MR across a secure mini-(kdc,
hdfs, yarn) cluster.
* the only way I can see this working is to not attempt wordcount but instead some trivial
AM which just checks for FS access and then exits. That way, all the MR pain is gone. Still
doesn't deal with secure yarn launch though.

Tested: s3 ireland. All tests working except

{code}
[ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 149.206 s <<<
FAILURE! - in org.apache.hadoop.fs.s3a.auth.delegation.ITestDelegatedMRJob
[ERROR] testWordCount(org.apache.hadoop.fs.s3a.auth.delegation.ITestDelegatedMRJob)  Time
elapsed: 99.301 s  <<< FAILURE!
java.lang.AssertionError: Job did not complete
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at org.apache.hadoop.fs.s3a.auth.delegation.ITestDelegatedMRJob.testWordCount(ITestDelegatedMRJob.java:334)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

Also saw a transient failure of {{ITestS3GuardToolDynamoDB}}, which I couldn't replicate standalone.
No obvious cause.

{code}
[ERROR] testPruneCommandCLI(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)  Time
elapsed: 615.35 s  <<< ERROR!
java.lang.Exception: test timed out after 600000 milliseconds
	at java.lang.Thread.sleep(Native Method)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoffOnBatchWrite(DynamoDBMetadataStore.java:813)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:765)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.innerPut(DynamoDBMetadataStore.java:851)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.removeAuthoritativeDirFlag(DynamoDBMetadataStore.java:1080)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.prune(DynamoDBMetadataStore.java:1033)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.prune(DynamoDBMetadataStore.java:993)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testPruneCommand(AbstractS3GuardToolTestBase.java:271)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testPruneCommandCLI(AbstractS3GuardToolTestBase.java:286)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

[ERROR] testSetCapacityFailFastIfNotGuarded(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)
 Time elapsed: 1.854 s  <<< ERROR!
java.io.FileNotFoundException: DynamoDB table 'd5ade866-32f4-42c7-b141-c8995c096a26' does
not exist in region eu-west-1; auto-creation is turned off
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.initTable(DynamoDBMetadataStore.java:1213)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.initialize(DynamoDBMetadataStore.java:356)
	at org.apache.hadoop.fs.s3a.s3guard.S3Guard.getMetadataStore(S3Guard.java:99)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:378)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
	at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3377)
	at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:530)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$SetCapacity.run(S3GuardTool.java:513)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:353)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:1552)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.run(AbstractS3GuardToolTestBase.java:115)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.lambda$testSetCapacityFailFastIfNotGuarded$2(AbstractS3GuardToolTestBase.java:331)
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:494)
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:380)
	at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:449)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testSetCapacityFailFastIfNotGuarded(AbstractS3GuardToolTestBase.java:330)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
{code}





> S3A to support Delegation Tokens
> --------------------------------
>
>                 Key: HADOOP-14556
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14556
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>         Attachments: HADOOP-14556-001.patch, HADOOP-14556-002.patch, HADOOP-14556-003.patch,
HADOOP-14556-004.patch, HADOOP-14556-005.patch, HADOOP-14556-007.patch, HADOOP-14556-008.patch,
HADOOP-14556-009.patch, HADOOP-14556-010.patch, HADOOP-14556.oath-002.patch, HADOOP-14556.oath.patch
>
>
> S3A to support delegation tokens where
> * an authenticated client can request a token via {{FileSystem.getDelegationToken()}}
> * Amazon's token service is used to request short-lived session secret & id; these
will be saved in the token and  marshalled with jobs
> * A new authentication provider will look for a token for the current user and authenticate
the user if found
> This will not support renewals; the lifespan of a token will be limited to the initial
duration. Also, as you can't request an STS token from a temporary session, IAM instances
won't be able to issue tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message