flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Resnick <cre...@gmail.com>
Subject Re: S3/S3A support
Date Wed, 12 Oct 2016 15:45:43 GMT
Regarding S3 and the Rolling/BucketingSink, we've seen data loss when
resuming from checkpoints, as S3 FileSystem implementations flush to
temporary files while the RollingSink expects a direct flush to in-progress
files. Because there is no such think as "flush and resume writing" to S3,
I don't know if RollingSink can be workable in a pure S3 environment. We
worked around it by using HDFS in a transient way.

On Tue, Oct 11, 2016 at 12:01 PM, Stephan Ewen <sewen@apache.org> wrote:

> Hi!
>
> The "truncate()" functionality is only needed for the rolling/bucketing
> sink. The core checkpoint functionality does not need any truncate()
> behavior...
>
> Best,
> Stephan
>
>
> On Tue, Oct 11, 2016 at 5:22 PM, Vijay Srinivasaraghavan <
> vijikarthi@yahoo.com.invalid> wrote:
>
> > Thanks Stephan. My understanding is checkpoint uses truncate API but S3A
> > does not support it. Will this have any impact?
> > Some of the known S3A client limitations are captured in Hortonworks site
> > https://hortonworks.github.io/hdp-aws/s3-s3aclient/index.html and
> > wondering if that has any impact on Flink deployment using S3?
> > RegardsVijay
> >
> >
> >
> >     On Tuesday, October 11, 2016 1:46 AM, Stephan Ewen <sewen@apache.org
> >
> > wrote:
> >
> >
> >  Hi!
> > In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual
> > consistency" nature of S3. The fix is not in v1.1 - that is the only
> known
> > issue I can think of.
> > It results in occasional (seldom) periods of heavy restart retries, until
> > all files are visible to all participants.
> > If you run into that issue, may be worthwhile to look at Flink
> > 1.2-SNAPSHOT.
> > Best,
> > Stephan
> >
> > On Tue, Oct 11, 2016 at 12:13 AM, Vijay Srinivasaraghavan
> > <vijikarthi@yahoo.com.invalid> wrote:
> >
> > Hello,
> > Per documentation (https://ci.apache.org/ projects/flink/flink-docs-
> > master/setup/aws.html), it looks like S3/S3A FS implementation is
> supported
> > using standard Hadoop S3 FS client APIs.
> > In the absence of using standard HCFS and going with S3/S3A,
> > 1) Are there any known limitations/issues?
> > 2) Does checkpoint/savepoint works properly?
> > Regards
> > Vijay
> >
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message