spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth <srikanth...@gmail.com>
Subject Spark with S3 DirectOutputCommitter
Date Fri, 09 Sep 2016 20:54:58 GMT
Hello,

I'm trying to use DirectOutputCommitter for s3a in Spark 2.0. I've tried a
few configs and none of them seem to work.
Output always creates _temporary directory. Rename is killing performance.
I read some notes about DirectOutputcommitter causing problems with
speculation turned on. Was this option removed entirely?

  val spark = SparkSession.builder()
                .appName("MergeEntities")
                .config("spark.sql.warehouse.dir", mergeConfig.getString("
sparkSqlWarehouseDir"))
                .config("fs.s3a.buffer.dir", "/tmp")
                .config("spark.hadoop.mapred.output.committer.class",
classOf[DirectOutputCommitter].getCanonicalName)
                .config("mapred.output.committer.class",
classOf[DirectOutputCommitter].getCanonicalName)
                .config("mapreduce.use.directfileoutputcommitter", "true")
                //.config("spark.sql.sources.outputCommitterClass",
classOf[DirectOutputCommitter].getCanonicalName)
                .getOrCreate()

Srikanth

Mime
View raw message