Sorry to hear about your travails.

I think you might be better off asking the spark community:

On Wed, Mar 8, 2017 at 3:22 AM, Jonhy Stack <> wrote:

I'm trying to read a s3 bucket from Spark and up until today Spark always complain that the request return 403

    hadoopConf = spark_context._jsc.hadoopConfiguration()
    hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
    hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
    hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
    logs = spark_context.textFile("s3a://mybucket/logs/*)

Spark was saying .... Invalid Access key [ACCESSKEY]

However with the same ACCESSKEY and SECRETKEY this was working with aws-cli

    aws s3 ls mybucket/logs/

and in python boto3 this was working

    resource = boto3.resource("s3", region_name="us-east-1")
    resource.Object("mybucket", "logs/") \
                .put(Body=open("", "rb"),ContentType="text/x-py")

so my credentials ARE invalid and the problem is definitely something with Spark..

Today I decided to turn on the "DEBUG" log for the entire spark and to my suprise... Spark is NOT using the [SECRETKEY] I have provided but instead... add a random one???

17/03/08 10:40:04 DEBUG request: Sending Request: HEAD / Headers: (Authorization: AWS ACCESSKEY:**[RANDON-SECRET-KEY]**, User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65, Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type: application/x-www-form-urlencoded; charset=utf-8, ) 

This is why it still return 403! Spark is not using the key I provide with fs.s3a.secret.key but instead invent a random one EACH time (everytime I submit the job the random secret key is different)

For the record I'm running this locally on my machine (OSX) with this command

    spark-submit --packages com.amazonaws:aws-java-sdk-pom:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3

Could some one enlighten me on this?