spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ewan Leith <ewan.le...@realitymine.com>
Subject spark-csv package - output to filename.csv?
Date Thu, 03 Sep 2015 15:04:39 GMT
Using the spark-csv package or outputting to text files, you end up with files named:

test.csv/part-00

rather than a more user-friendly "test.csv", even if there's only 1 part file.

We can merge the files using the Hadoop merge command with something like this code from http://deploymentzone.com/2015/01/30/spark-and-merged-csv-files/


def merge(sc: SparkContext, srcPath: String, dstPath: String): Unit = {

    val srcFileSystem = FileSystem.get(new URI(srcPath), sc.hadoopConfiguration)

    val dstFileSystem = FileSystem.get(new URI(dstPath), sc.hadoopConfiguration)

    dstFileSystem.delete(new Path(dstPath), true)

    FileUtil.copyMerge(srcFileSystem, new Path(srcPath), dstFileSystem, new Path(dstPath),
true, sc.hadoopConfiguration, null)

  }

but does anyone know a way without dropping down to Hadoop.fs code?

Thanks,
Ewan

Mime
View raw message