spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeroen Miller <bluedasya...@gmail.com>
Subject Re: Processing a splittable file from a single executor
Date Thu, 16 Nov 2017 09:41:59 GMT
On 16 Nov 2017, at 10:22, Michael Shtelma <mshtelma@gmail.com> wrote:
> you call repartition(1) before starting processing your files. This
> will ensure that you end up with just one partition.

One question and one remark:

Q) val ds = sqlContext.read.parquet(path).repartition(1)

Am I absolutely sure that my file here is read by a single executor and that no data shuffling
takes place afterwards to get that single partition?

R) This approach did not work for me.

    val ds = sqlContext.read.parquet(path).repartition(1)
    
    // ds on a single partition

    ds.createOrReplaceTempView("ds")

    val result = sqlContext.sql("... from ds")

    // result on 166 partitions... How to force the processing on a
    // single executor?

    result.write.csv(...)

    // 166 files :-/

Jeroen


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message