flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Failures on DataSet programs
Date Wed, 28 Sep 2016 07:44:09 GMT
Hey Paulo! I think it's not possible out of the box at the moment, but
you can try the following as a work around:

1) Create a custom OutputFormat that extends TextOutputFormat and
override the clean up method:

public class NoCleanupTextOutputFormat<T> extends TextOutputFormat<T> {

    @Override
    public void tryCleanupOnError() {
       // ignore cleanup on error
    }

}

2) writeAsFormattedText is actually a map + writeAsText (if you look
into DataSet.java). Instead of that you should manually do:

dataSet.map(new FormattingMapper<>(clean(formatter))).output(new
NoCleanupTextOutputFormat(..))


This should work as expected. You can furthermore open an issue with a
feature request to allow configuring Flink's TextOutputFormat to
ignore cleanup.

Best,

Ufuk


On Tue, Sep 27, 2016 at 10:42 PM, Paulo Cezar <paulo.cezar@gogeo.io> wrote:
> Hi Folks,
>
> I was wondering if it's possible to keep partial outputs from dataset
> programs.
> I have a batch pipeline that writes its output on HDFS using
> writeAsFormattedText. When it fails the output file is deleted but I would
> like to keep it so that I can generate new inputs for the pipeline to avoid
> reprocessing.
>
> []'s
> Paulo Cezar

Mime
View raw message