spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: [SQL] Write parquet files under partition directories?
Date Tue, 02 Jun 2015 05:25:28 GMT
There will be in 1.4.

df.write.partitionBy("year", "month", "day").parquet("/path/to/output")

On Mon, Jun 1, 2015 at 10:21 PM, Matt Cheah <mcheah@palantir.com> wrote:

> Hi there,
>
> I noticed in the latest Spark SQL programming guide
> <https://spark.apache.org/docs/latest/sql-programming-guide.html>, there
> is support for optimized reading of partitioned Parquet files that have a
> particular directory structure (year=1/month=10/day=3, for example).
> However, I see no analogous way to write DataFrames as Parquet files with
> similar directory structures based on user-provided partitioning.
>
> Generally, is it possible to write DataFrames as partitioned Parquet files
> that downstream partition discovery can take advantage of later? I
> considered extending the Parquet output format, but it looks like
> ParquetTableOperations.scala has fixed the output format to
> AppendingParquetOutputFormat.
>
> Also, I was wondering if it would be valuable to contribute writing
> Parquet in partition directories as a PR.
>
> Thanks,
>
> -Matt Cheah
>

Mime
View raw message