spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pnpritchard <>
Subject Re: SPARK SQL Error
Date Wed, 14 Oct 2015 17:18:33 GMT
I think the stack trace is quite informative.

Assuming line 10 of CsvDataSource is "val df =
sqlContext.load("com.databricks.spark.csv", Map("path" ->
args(1),"header"->"true"))", then the "args(1)" call is throwing an
ArrayIndexOutOfBoundsException. The reason for this is because you aren't
passing any command line arguments to your application. When using
spark-submit, you should put all of your app command line arguments at then
end, after the jar. In your example, I think you'd want:

spark-submit --master yarn --class org.spark.apache.CsvDataSource --files
hdfs:///people_csv /home/cloudera/Desktop/TestMain.jar hdfs:///people_csv

Also, I don't think it is necessary for you to have "--files
hdfs:///people_csv". The documentation for this option says "Comma-separated
list of files to be placed in the working directory of each executor." Since
you are going to read the "people_csv" file from hdfs, rather than the local
file system, it seems unnecessary.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message