flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Kidder <kidder.sc...@gmail.com>
Subject Re: S3 checkpointing in AWS in Frankfurt
Date Wed, 23 Nov 2016 15:38:20 GMT
Hi Jonathan,

You might be better off creating a small Hadoop HDFS cluster just for the
purpose of storing Flink checkpoint & savepoint data. Like you, I tried
using S3 to persist Flink state, but encountered AWS SDK issues and felt
like I was going down an ill-advised path. I then created a small 3-node
HDFS cluster in the same region as my Flink hosts but distributed across 3
AZs. The checkpointing is very fast and, most importantly, just works.

Is there a firm requirement to use S3, or could you use HDFS instead?


--Scott Kidder

On Tue, Nov 22, 2016 at 11:52 PM, Jonathan Share <jon.share@gmail.com>

> Hi,
> I'm interested in hearing if anyone else has experience with using Amazon
> S3 as a state backend in the Frankfurt region. For political reasons we've
> been asked to keep all European data in Amazon's Frankfurt region. This
> causes a problem as the S3 endpoint in Frankfurt requires the use of AWS
> Signature Version 4 "This new Region supports only Signature Version 4"
> [1] and this doesn't appear to work with the Hadoop version that Flink is
> built against [2].
> After some hacking we have managed to create a docker image with a build
> of Flink 1.2 master, copying over jar files from the hadoop
> 3.0.0-alpha1 package and this appears to work, for the most part but we
> still suffer from some classpath problems (conflicts between AWS API used
> in hadoop and those we want to use in out streams for interacting with
> Kinesis) and the whole thing feels a little fragile. Has anyone else tried
> this? Is there a simpler solution?
> As a follow-up question, we saw that with checkpointing on three
> relatively simple streams set to 1 second, our S3 costs were higher than
> the EC2 costs for our entire infrastructure. This seems slightly
> disproportionate. For now we have reduced checkpointing interval to 10
> seconds and that has greatly improved the cost projections graphed via
> Amazon Cloud Watch, but I'm interested in hearing other peoples experience
> with this. Is that the kind of billing level we can expect or is this a
> symptom of a mis-configuration? Is this a setup others are using? As we are
> using Kinesis as the source for all streams I don't see a huge risk with
> larger checkpoint intervals and our Sinks are designed to mostly tolerate
> duplicates (some improvements can be made).
> Thanks in advance
> Jonathan
> [1] https://aws.amazon.com/blogs/aws/aws-region-germany/
> [2] https://issues.apache.org/jira/browse/HADOOP-13324

View raw message