spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-17869) Connect to Amazon S3 using signature version 4 (only choice in Frankfurt)
Date Tue, 11 Oct 2016 07:48:21 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15564801#comment-15564801
] 

Sean Owen commented on SPARK-17869:
-----------------------------------

This isn't a Spark issue, right? it's an issue with S3 config in your app or the S3 library.

> Connect to Amazon S3 using signature version 4 (only choice in Frankfurt)
> -------------------------------------------------------------------------
>
>                 Key: SPARK-17869
>                 URL: https://issues.apache.org/jira/browse/SPARK-17869
>             Project: Spark
>          Issue Type: Improvement
>    Affects Versions: 2.0.0, 2.0.1
>         Environment: Mac OS X / Ubuntu
> pyspark
> hadoop-aws:2.7.3
> aws-java-sdk:1.11.41
>            Reporter: Robin B
>
> Connection fails with **400 Bad request** for S3 in Frankfurt region where version 4
authentication is needed to connect. 
> This issue is somewhat related HADOOP-13325, but the solution (to include the endpoint
explicitly) does nothing to ameliorate the problem.
>     sc._jsc.hadoopConfiguration().set('fs.s3a.impl','org.apache.hadoop.fs.s3native.NativeS3FileSystem')
>     sc._jsc.hadoopConfiguration().set('com.amazonaws.services.s3.enableV4','true')
>     sc.setSystemProperty('SDKGlobalConfiguration.ENABLE_S3_SIGV4_SYSTEM_PROPERTY','true')
>     sc._jsc.hadoopConfiguration().set('fs.s3a.endpoint','s3.eu-central-1.amazonaws.com')
>     sc._jsc.hadoopConfiguration().set('fs.s3a.awsAccessKeyId','ACCESS_KEY')
>     sc._jsc.hadoopConfiguration().set('fs.s3a.awsSecretAccessKey','SECRET_KEY')
>     df = spark.read.csv("s3a://BUCKET-NAME/filename.csv")
> yields:
> 	16/10/10 18:39:28 WARN DataSource: Error while looking for metadata directory.
> 	Traceback (most recent call last):
> 	  File "<stdin>", line 1, in <module>
> 	  File "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/pyspark/sql/readwriter.py",
line 363, in csv
> 	    return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
> 	  File "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
line 933, in __call__
> 	  File "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/pyspark/sql/utils.py", line
63, in deco
> 	    return f(*a, **kw)
> 	  File "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
line 312, in get_return_value
> 	py4j.protocol.Py4JJavaError: An error occurred while calling o35.csv.
> 	: java.io.IOException: s3n://BUCKET-NAME : 400 : Bad Request
> 		at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:453)
> 		at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427)
> 		at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
> 		at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181)
> 		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 		at java.lang.reflect.Method.invoke(Method.java:497)
> 		at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> 		at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 		at org.apache.hadoop.fs.s3native.$Proxy7.retrieveMetadata(Unknown Source)
> 		at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:476)
> 		at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
> 		at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:360)
> 		at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:350)
> 		at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> 		at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> 		at scala.collection.immutable.List.foreach(List.scala:381)
> 		at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
> 		at scala.collection.immutable.List.flatMap(List.scala:344)
> 		at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
> 		at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
> 		at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:401)
> 		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 		at java.lang.reflect.Method.invoke(Method.java:497)
> 		at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
> 		at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> 		at py4j.Gateway.invoke(Gateway.java:280)
> 		at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
> 		at py4j.commands.CallCommand.execute(CallCommand.java:79)
> 		at py4j.GatewayConnection.run(GatewayConnection.java:211)
> 		at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message