flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rafi Aroch <rafi.ar...@gmail.com>
Subject BucketingSink capabilities for DataSet API
Date Thu, 25 Oct 2018 11:08:40 GMT
Hi,

I'm writing a Batch job which reads Parquet, does some aggregations and
writes back as Parquet files.
I would like the output to be partitioned by year, month, day by event
time. Similarly to the functionality of the BucketingSink.

I was able to achieve the reading/writing to/from Parquet by using the
hadoop-compatibility features.
I couldn't find a way to partition the data by year, month, day to create a
folder hierarchy accordingly. Everything is written to a single directory.

I could find an unanswered question about this issue:
https://stackoverflow.com/questions/52204034/apache-flink-does-dataset-api-support-writing-output-to-individual-file-partit

Can anyone suggest a way to achieve this? Maybe there's a way to integrate
the BucketingSink with the DataSet API? Another solution?

Rafi

Mime
View raw message