hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13727) S3A: Reduce high number of connections to EC2 Instance Metadata Service caused by InstanceProfileCredentialsProvider.
Date Mon, 17 Oct 2016 22:33:58 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Nauroth updated HADOOP-13727:
-----------------------------------
    Attachment: HADOOP-13727-branch-2.001.patch

I'm attaching patch 001.  We have confirmed through load testing in our EC2 environment that
this patch prevents the throttling problems we had seen.  I also have completed a full S3A
test run against US-west-2.

* Define {{SharedInstanceProfileCredentialsProvider}} as a subclass of {{InstanceProfileCredentialsProvider}},
which enforces creation of only a single instance.
* Change credential provider creation logic in {{S3AUtils}} to support use of the shared instance,
both in the default case and the case that the user has configured {{fs.s3a.aws.credentials.provider}}.
* Also change the logic of {{S3AUtils}} for more edge case validation, better error messages
and better readability (I hope).
* Update site documentation and core-default.xml to describe the new provider.
* Set up a new unit test suite, {{TestS3AAWSCredentialsProvider}}.  There were multiple tests
from {{ITestS3AAWSCredentialsProvider}} that didn't really need full S3 integration, so I've
moved them to the new unit test suite.  Now they'll run in pre-commit.  I also added new tests
for the new functionality and new validation logic.

As of AWS SDK 1.11.39, the SDK code internally enforces a singleton.  After Hadoop upgrades
to that version or higher, it's likely that we can remove this code.

Also, I have proposed a change to the FileSystem cache logic in HADOOP-13726 that would have
prevented this from surfacing.  That's going to be a much riskier change, so I'd still like
to proceed with the S3A change here.

> S3A: Reduce high number of connections to EC2 Instance Metadata Service caused by InstanceProfileCredentialsProvider.
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13727
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13727
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Rajesh Balamohan
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13727-branch-2.001.patch
>
>
> When running in an EC2 VM, S3A can make use of {{InstanceProfileCredentialsProvider}}
from the AWS SDK to obtain credentials from the EC2 Instance Metadata Service.  We have observed
that for a highly multi-threaded application, this may generate a high number of calls to
the Instance Metadata Service.  The service may throttle the client by replying with an HTTP
429 response or forcibly closing connections.  We can greatly reduce the number of calls to
the service by enforcing that all threads use a single shared instance of {{InstanceProfileCredentialsProvider}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message