spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23652) Spark Connection with S3
Date Mon, 12 Mar 2018 16:00:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16395428#comment-16395428
] 

Steve Loughran commented on SPARK-23652:
----------------------------------------

Don't use the s3:// connector which ships with the ASF (opposed to the EMR) spark releases,
use the "s3a" connector which (a) interoperates with other clients of the S3 bucket (b), is
maintained and (c) doesn't use jets3t. your stack trace will go away.

this isn't actually a spark problem except in the general classpath level, its happending
in the hadoop lib. But were you file it there it'd be closed as a WONTFIX as s3 and s3n have
both been cut from the codebase, leaving only that s3a connector whose authentication parameters
you've actually been setting



> Spark Connection with S3
> ------------------------
>
>                 Key: SPARK-23652
>                 URL: https://issues.apache.org/jira/browse/SPARK-23652
>             Project: Spark
>          Issue Type: Question
>          Components: Spark Shell, Spark Submit
>    Affects Versions: 1.6.0
>            Reporter: Abhishek Shrivastava
>            Priority: Critical
>
> In below spark-shell I am trying to connect to S3 and load file to create dataframe:
>  
> {{spark-shell --packages com.databricks:spark-csv_2.10:1.5.0 scala> val sqlContext
= new org.apache.spark.sql.SQLContext(sc) scala> sc.hadoopConfiguration.set("fs.s3a.access.key",
"") scala> sc.hadoopConfiguration.set("fs.s3a.secret.key", "") scala> val weekly = sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").option("delimiter", ",").load("s3://usr_bucket/data/file.csv") scala> print(weekly)
scala> weekly.show()}}
>  
>  
> {{Error:}}
> {{java.lang.VerifyError: Bad type on operand stack Exception Details: Location: org/apache/hadoop/fs/s3/Jets3tFileSystemStore.initialize(Ljava/net/URI;Lorg/apache/hadoop/conf/Configuration;)V
@43: invokespecial Reason: Type 'org/jets3t/service/security/AWSCredentials' (current frame,
stack[3]) is not assignable to 'org/jets3t/service/security/ProviderCredentials' Current Frame:
bci: @43 flags: \{ } locals: \{ 'org/apache/hadoop/fs/s3/Jets3tFileSystemStore', 'java/net/URI',
'org/apache/hadoop/conf/Configuration', 'org/apache/hadoop/fs/s3/S3Credentials', 'org/jets3t/service/security/AWSCredentials'
} stack: \{ 'org/apache/hadoop/fs/s3/Jets3tFileSystemStore', uninitialized 37, uninitialized
37, 'org/jets3t/service/security/AWSCredentials' } Bytecode: 0000000: 2a2c b500 02bb 0003
59b7 0004 4e2d 2b2c 0000010: b600 05bb 0006 592d b600 072d b600 08b7 0000020: 0009 3a04 2abb
000a 5919 04b7 000b b500 0000030: 0ca7 0023 3a04 1904 b600 0ec1 000f 9900 0000040: 0c19 04b6
000e c000 0fbf bb00 1059 1904 0000050: b700 11bf 2abb 0012 592b b600 13b7 0014 0000060: b500
152a 2c12 1611 1000 b600 17b5 0018 0000070: b1 Exception Handler Table: bci [19, 49] =>
handler: 52 Stackmap Table: full_frame(@52,\{Object[#194],Object[#195],Object[#196],Object[#197]},\{Object[#198]})
append_frame(@74,Object[#198]) chop_frame(@84,1) at org.apache.hadoop.fs.s3.S3FileSystem.createDefaultStore(S3FileSystem.java:119)
at org.apache.hadoop.fs.s3.S3FileSystem.initialize(S3FileSystem.java:109) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2816)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2853)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2835) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1307)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.take(RDD.scala:1302)
at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1342) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.first(RDD.scala:1341) at com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:269)
at com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:265) at com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:242)
at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:74) at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:171)
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:44) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:48)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:50) at $iwC$$iwC$$iwC.<init>(<console>:52)
at $iwC$$iwC.<init>(<console>:54) at $iwC.<init>(<console>:56) at
<init>(<console>:58) at .<init>(<console>:62) at .<clinit>(<console>)
at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at
java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064) at org.apache.spark.repl.Main$.main(Main.scala:35)
at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)}}{{}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message