spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Parikh <sand...@clusterbeep.org>
Subject increasing concurrency of saveAsNewAPIHadoopFile?
Date Thu, 19 Jun 2014 19:38:50 GMT
I'm trying to write a JavaPairRDD to a downstream database using
saveAsNewAPIHadoopFile with a custom OutputFormat and the process is pretty
slow.

Is there a way to boost the concurrency of the save process? For example,
something like splitting the RDD into multiple smaller RDDs and using Java
threads to write the data out? That seems foreign to the way Spark works so
not sure if there's a better way.

Mime
View raw message