spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ranga <sra...@gmail.com>
Subject Re: S3 Bucket Access
Date Tue, 14 Oct 2014 18:10:17 GMT
One related question. Could I specify the "
com.amazonaws.services.s3.AmazonS3Client" implementation for the  "
fs.s3.impl" parameter? Let me try that and update this thread with my
findings.

On Tue, Oct 14, 2014 at 10:48 AM, Ranga <sranga@gmail.com> wrote:

> Thanks for the input.
> Yes, I did use the "temporary" access credentials provided by the IAM role
> (also detailed in the link you provided). The session token needs to be
> specified and I was looking for a way to set that in the header (which
> doesn't seem possible).
> Looks like a static key/secret is the only option.
>
> On Tue, Oct 14, 2014 at 10:32 AM, Gen <gen.tang86@gmail.com> wrote:
>
>> Hi,
>>
>> If I remember well, spark cannot use the IAMrole credentials to access to
>> s3. It use at first the id/key in the environment. If it is null in the
>> environment, it use the value in the file core-site.xml.  So, IAMrole is
>> not
>> useful for spark. The same problem happens if you want to use distcp
>> command
>> in hadoop.
>>
>>
>> Do you use curl http://169.254.169.254/latest/meta-data/iam/... to get
>> the
>> "temporary" access. If yes, this code cannot use directly by spark, for
>> more
>> information, you can take a look
>> http://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html
>> <http://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html>
>>
>>
>>
>> sranga wrote
>> > Thanks for the pointers.
>> > I verified that the access key-id/secret used are valid. However, the
>> > secret may contain "/" at times. The issues I am facing are as follows:
>> >
>> >    - The EC2 instances are setup with an IAMRole () and don't have a
>> > static
>> >    key-id/secret
>> >    - All of the EC2 instances have access to S3 based on this role (I
>> used
>> >    s3ls and s3cp commands to verify this)
>> >    - I can get a "temporary" access key-id/secret based on the IAMRole
>> but
>> >    they generally expire in an hour
>> >    - If Spark is not able to use the IAMRole credentials, I may have to
>> >    generate a static key-id/secret. This may or may not be possible in
>> the
>> >    environment I am in (from a policy perspective)
>> >
>> >
>> >
>> > - Ranga
>> >
>> > On Tue, Oct 14, 2014 at 4:21 AM, Rafal Kwasny &lt;
>>
>> > mag@
>>
>> > &gt; wrote:
>> >
>> >> Hi,
>> >> keep in mind that you're going to have a bad time if your secret key
>> >> contains a "/"
>> >> This is due to old and stupid hadoop bug:
>> >> https://issues.apache.org/jira/browse/HADOOP-3733
>> >>
>> >> Best way is to regenerate the key so it does not include a "/"
>> >>
>> >> /Raf
>> >>
>> >>
>> >> Akhil Das wrote:
>> >>
>> >> Try the following:
>> >>
>> >> 1. Set the access key and secret key in the sparkContext:
>> >>
>> >> sparkContext.set("
>> >>> ​
>> >>> AWS_ACCESS_KEY_ID",yourAccessKey)
>> >>
>> >> sparkContext.set("
>> >>> ​
>> >>> AWS_SECRET_ACCESS_KEY",yourSecretKey)
>> >>
>> >>
>> >> 2. Set the access key and secret key in the environment before starting
>> >> your application:
>> >>
>> >> ​
>> >>>
>> >> export
>> >>> ​​
>> >>> AWS_ACCESS_KEY_ID=
>> > <your access>
>> >>
>> >> export
>> >>> ​​
>> >>> AWS_SECRET_ACCESS_KEY=
>> > <your secret>
>> > ​
>> >>
>> >>
>> >> 3. Set the access key and secret key inside the hadoop configurations
>> >>
>> >> val hadoopConf=sparkContext.hadoopConfiguration;
>> >>>
>> >>> hadoopConf.set("fs.s3.impl",
>> >>>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>> >>>
>> >>> hadoopConf.set("fs.s3.awsAccessKeyId",yourAccessKey)
>> >>>
>> >>> hadoopConf.set("fs.s3.awsSecretAccessKey",yourSecretKey)
>> >>>
>> >>>
>> >> 4. You can also try:
>> >>
>> >> val lines =
>> >>
>> >> ​s
>> >>> parkContext.textFile("s3n://yourAccessKey:yourSecretKey@
>> >>>
>> > <yourBucket>
>> > /path/")
>> >>
>> >>
>> >> Thanks
>> >> Best Regards
>> >>
>> >> On Mon, Oct 13, 2014 at 11:33 PM, Ranga &lt;
>>
>> > sranga@
>>
>> > &gt; wrote:
>> >>
>> >>> Hi
>> >>>
>> >>> I am trying to access files/buckets in S3 and encountering a
>> permissions
>> >>> issue. The buckets are configured to authenticate using an IAMRole
>> >>> provider.
>> >>> I have set the KeyId and Secret using environment variables (
>> >>> AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID). However, I am still
>> unable
>> >>> to access the S3 buckets.
>> >>>
>> >>> Before setting the access key and secret the error was:
>> >>> "java.lang.IllegalArgumentException:
>> >>> AWS Access Key ID and Secret Access Key must be specified as the
>> >>> username
>> >>> or password (respectively) of a s3n URL, or by setting the
>> >>> fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties
>> >>> (respectively)."
>> >>>
>> >>> After setting the access key and secret, the error is: "The AWS Access
>> >>> Key Id you provided does not exist in our records."
>> >>>
>> >>> The id/secret being set are the right values. This makes me believe
>> that
>> >>> something else ("token", etc.) needs to be set as well.
>> >>> Any help is appreciated.
>> >>>
>> >>>
>> >>> - Ranga
>> >>>
>> >>
>> >>
>> >>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/S3-Bucket-Access-tp16303p16397.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message