impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laszlo Gaal (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
Date Mon, 13 Nov 2017 22:13:32 GMT
Hello Lars Volker, Michael Brown, Jim Apple, Philip Zeyliger, Sailesh Mukil, David Knupp, Joe
McDonnell, Tim Armstrong, Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8294

to look at the new patch set (#3).

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
......................................................................

IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

For some time Impala in a production environment has been able
to access data stored in Amazon S3 buckets using credentials specified
in a number of ways:
- storing Amazon access keys in environment variables or
  in core-site.xml.
- using proprietary management tools to store Amazon access keys
  securely
- using Amazon IAM roles bound to VMs running in EC2.

The development minicluster environment used the first approach,
which risked leaking these keys.

This change enables Impala builds to use IAM
roles to access S3 buckets when running on an Amazon EC2 virtual
machine. The changes mainly ensure that environment variables and/or Jenkins
parameters carrying the traditional AWS credentials do not conflict with
credentials supplied by the IAM role attached to the VM instance.

The change also moves the logic performing the S3 access checks into a separate
script file: bin/check-s3-access.sh.

IAM role based credentials are accessible through the EC2
instance-property mechanism; for further details see Amazon's docs at
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials

Changes to the configuration script:
1. bin/impala-config.sh stops setting the AWS_* environment variables
   to dummy default values. When AWS credentials are not supplied in
   the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY,
   these variables are unset (removed from the environment), otherwise
   they would interfere with authentication based on the IAM role.
2. Having AWS credentials in the AWS_* environment variables is now
   optional. They are still accepted to allow for private test runs
   accessing private/nondefault buckets with custom credentials.
3. bin/impala-config.sh now calls bin/check-s3-access.sh to perform the actual
   S3-dependent checks. check-s3-access.sh contains the S3-specific logic and
   network access needed to check if the requested S3 bucket is accessible
   for the build.

Changes to the minicluster configuration:
1. Security credentials for the s3n: connector, located in core-site.xml
   are no longer replaced with actual AWS_ credentials when configuring
   the minicluster. These parameters are used for some front-end tests,
   which don't actually reach out to S3, the s3n: notation just simulates
   non-HDFS storage.
   For these tests to work s3n: authentication parameters still need to
   exist in core-site.xml. Their values do not matter, so the configuration
   template now has fixed dummy values for these parameters.

2. Remove empty s3a: security parameter sections from core-site.xml:
   The testdata/cluster/admin setup script substitutes values from
   environment variables into core-site.xml when it sets up the minicluster
   runtime environment.

   The configuration section for s3a: credentials is now completely
   removed if both of the following conditions are met:
   - the target filesystem is set to "s3"
   - the AWS credential environment variables AWS_ACCESS_KEY_ID
     and AWS_SECRET_ACCESS_KEY are both empty or missing.

   The configuration file core-site.xml.tmpl is extended with
   comment markers that delimit the section to be removed in this case.

Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
---
A bin/check-s3-access.sh
M bin/impala-config.sh
M testdata/cluster/admin
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
4 files changed, 163 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/8294/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 3
Gerrit-Owner: Laszlo Gaal <laszlo.gaal@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: David Knupp <dknupp@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple-impala@apache.org>
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <laszlo.gaal@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <philip@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message