flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Storage options for RocksDBStateBackend
Date Thu, 11 May 2017 12:16:05 GMT
Hi Ayush,

you’re right that RocksDB is the recommend state backend because of the
above-mentioned reasons. In order to make the recovery properly work, you
have to configure a shared directory for the checkpoint data via
state.backend.fs.checkpointdir. You can basically configure any file system
which is supported by Hadoop (no HDFS required). The reason is that we use
Hadoop to bridge between different file systems. The only thing you have to
make sure is that you have the respective file system implementation in
your class path.

I think you can access Windows Azure Blob Storage via Hadoop [1] similarly
to access S3, for example.

If you use S3 to store your checkpoint data, then you will benefit from all
the advantages of S3 but also suffer from its drawbacks (e.g. that list
operations are more costly). But these are not specific to Flink.

A URL like file:// usually indicates a local file. Thus, if your Flink
cluster is not running on a single machine, then this won’t work.

[1] https://hadoop.apache.org/docs/stable/hadoop-azure/index.html


On Thu, May 11, 2017 at 10:41 AM, Ayush Goyal <ayush@helpshift.com> wrote:

> Hello,
> I had a few questions regarding checkpoint storage options using
> RocksDBStateBackend. In the flink 1.2 documentation, it is the recommended
> state
> backend due to it's ability to store large states and asynchronous
> snapshotting.
> For high availabilty it seems HDFS is the recommended store for state
> backend
> data. In AWS deployment section, it is also mentioned that s3 can be used
> for
> storing state backend data.
> We don't want to depend on a hadoop cluster for flink deployment, so I had
> following questions:
> 1. Can we use any storage backend supported by flink for storing RocksDB
> StateBackend data with file urls: there are quite a few supported as
> mentioned here:
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/
> filesystems.html
> and here:
> https://github.com/apache/flink/blob/master/docs/dev/batch/connectors.md
> 2. Is there some work already done to support Windows Azure Blob Storage
> for
> storing State backend data? There are some docs here:
> https://github.com/apache/flink/blob/master/docs/dev/batch/connectors.md
> can we utilize this for that?
> 3. If utilizing S3 for state backend, is there any performance impact?
> 4. For high availability can we use a NFS volume for state backend, with
> "file://" urls? Will there be any performance impact?
> PS: I posted this email earlier via nabble, but it's not showing up in
> apache archive. So sending again. Apologies if it results in multiple
> threads.
> -- Ayush

View raw message