flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Grimes <frankgrime...@yahoo.com>
Subject Is there a Flink DataSet equivalent to Spark's RDD.persist?
Date Thu, 21 Feb 2019 18:41:21 GMT
Hi,
I'm trying to port an existing Spark job to Flink and have gotten stuck on the same issue
brought up here:
https://stackoverflow.com/questions/46243181/cache-and-persist-datasets
Is there some way to accomplish this same thing in Flink?i.e. avoid re-computing a particular
DataSet when multiple different subsequent transformations are required on it.
I've even tried explicitly writing out the DataSet to avoid the re-computation but still taking
an I/O hit for the initial write to HDFS and subsequent re-reading of it in the following
stages. While it does yield a performance improvement over no caching at all, it doesn't
match the performance I get with RDD.persist in Spark.
Thanks,
Frank Grimes
Mime
View raw message