flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael-Keith Bernard <mkbern...@opentable.com>
Subject Flink + S3
Date Tue, 19 Apr 2016 01:54:37 GMT
Hello Flink Users!

I'm a Flink newbie at the early stages of deploying our first Flink cluster into production
and I have a few questions about wiring up Flink with S3:

* We are going to use the HA configuration[1] from day one (we have existing zk infrastructure
already). Can S3 be used as a state backend for the Job Manager? The documentation talks about
using S3 as a state backend for TM[2] (and in particular for streaming), but I'm wondering
if it's a suitable backend for the JM as well.

* How do I configure S3 for Flink when I don't already have an existing Hadoop cluster? The
documentation references the Hadoop configuration manifest[3], which kind of implies to me
that I must already be running Hadoop (or at least have a properly configured Hadoop cluster).
Is there an example somewhere of using S3 as a storage backend for a standalone cluster?

* Bonus: I'm writing a Puppet module for installing/configuring/managing Flink in stand alone
mode with an existing zk cluster. Are there any existing modules for this (I didn't find anything
in the forge)? Would others in the community be interested if we added our module to the forge
once complete?

Thanks so much for your time and consideration. We look forward to using Flink in production!


[1]: https://ci.apache.org/projects/flink/flink-docs-master/setup/jobmanager_high_availability.html#standalone-cluster-high-availability

[2]: https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html#s3-simple-storage-service

[3]: https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html#set-s3-filesystem
View raw message