flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: writeAsCSV with partitionBy
Date Mon, 15 Feb 2016 11:20:53 GMT
Hi Srikanth,

DataSet.partitionBy() will partition the data on the declared partition
fields.
If you append a DataSink with the same parallelism as the partition
operator, the data will be written out with the defined partitioning.
It should be possible to achieve the behavior you described using
DataSet.partitionByHash() or partitionByRange().

Best, Fabian


2016-02-12 20:53 GMT+01:00 Srikanth <srikanth.ht@gmail.com>:

> Hello,
>
>
>
> Is there a Hive(or Spark dataframe) partitionBy equivalent in Flink?
>
> I'm looking to save output as CSV files partitioned by two columns(date
> and hour).
>
> The partitionBy dataset API is more to partition the data based on a
> column for further processing.
>
>
>
> I'm thinking there is no direct API to do this. But what will be the best
> way of achieving this?
>
>
>
> Srikanth
>
>
>

Mime
View raw message