spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Verma <rohit.ve...@rokittech.com>
Subject Spark failing while persisting sorted columns.
Date Thu, 09 Mar 2017 09:41:09 GMT
Hi all,

Please help me with below scenario.

While writing below query on large dataset (rowCount=100,000,000) using below query

// there are other instance of below job submitting to spark in multithreaded app.

final Dataset<Row> df = spark.read().parquet(tablePath);
// df storage is hdfs is 5.64 GB with 45 blocks.
df.select(col).na().drop().dropDuplicates(col).coalesce(20).sort(df.col(col)).coalesce(1).write().mode(SaveMode.Ignore).csv(path);

Getting below exception.

Task failed while writing rows
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location
for shuffle 2991


Here are spark env details:


  *   Cores in use: 20 Total, 0 Used
  *   Memory in use: 72.2 GB Total, 0.0 B Used

And process configuration are as

"spark.cores.max", “20"
"spark.executor.memory", “3400MB"
“spark.kryoserializer.buffer.max”,”1000MB”

Any leads would be highly appreciated.

Regards
Rohit Verma


Mime
View raw message