flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paschek, Robert" <robert.pasc...@tu-berlin.de>
Subject AW: Writing Intermediates to disk
Date Sun, 19 Jun 2016 14:57:01 GMT
Hey,



thank you for your answers and sorry for my late response : - (



The intention was to store some of the data to disk, when the main memory gets full / my temporary
ArrayList<Tuple> reaches a pre-defined size.



I used com.opencsv.CSVReader and import com.opencsv.CSVWriter for this task and getRuntimeContext().getIndexOfThisSubtask()
to differ the filenames from other tasks, running on the same machine.

Fortunately that isn't no longer necessary form my work.



Best

Robert




________________________________
Von: Vikram Saxena <vikram.mbt@gmail.com>
Gesendet: Montag, 9. Mai 2016 12:15
An: user@flink.apache.org
Betreff: Re: Writing Intermediates to disk

I do not know if I understand completely, but I would  create a new DataSet based on filtering
the condition and then persist this DataSet.

So :

DataSet ds2 = DataSet1.filter(Condition)

2ds.output(...)




On Mon, May 9, 2016 at 11:09 AM, Ufuk Celebi <uce@apache.org<mailto:uce@apache.org>>
wrote:
Flink has support for spillable intermediate results. Currently they
are only set if necessary to avoid pipeline deadlocks.

You can force this via

env.getConfig().setExecutionMode(ExecutionMode.BATCH);

This will write shuffles to disk, but you don't get the fine-grained
control you probably need for your use case.

- Ufuk

On Thu, May 5, 2016 at 3:29 PM, Paschek, Robert
<robert.paschek@tu-berlin.de<mailto:robert.paschek@tu-berlin.de>> wrote:
> Hi Mailing List,
>
>
>
> I want to write and read intermediates to/from disk.
>
> The following foo- codesnippet may illustrate my intention:
>
>
>
> public void mapPartition(Iterable<T> tuples, Collector<T> out) {
>
>
>
>                 for (T tuple : tuples) {
>
>
>
>                                if (Condition)
>
>                                                out.collect(tuple);
>
>                                else
>
>                                                writeTupleToDisk
>
>                 }
>
>
>
>                 While ('TupleOnDisk')
>
>                                out.collect('ReadNextTupleFromDisk');
>
> }
>
>
>
> I'am wondering if flink provides an integrated class for this purpose. I
> also have to precise identify the files with the intermediates due
> parallelism of mapPartition.
>
>
>
>
>
> Thank you in advance!
>
> Robert


Mime
View raw message