flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Writing Parquet files with Flink
Date Thu, 28 Jan 2016 15:12:13 GMT
Hi to all,

I was reading about optimal Parquet file size and HDFS block size.
The ideal situation for Parquet is when its block size (and thus the
maximum size of each row group) is equal to the HDFS block size. The
default behaviour of Flink is that the output file's size depends on the
output parallelism and thus I don't know how to achieve that.
Is that feasible?


View raw message