flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Iterative queries on Flink
Date Wed, 02 Dec 2015 15:29:27 GMT
Do you think it is possible to push ahead this thing? I need to implement
this interactive feature of Datasets. Do you think it is possible to
implement the persist() method in Flink (similar to Spark)? If you want I
can work on it with some instructions..

On Wed, Dec 2, 2015 at 3:05 PM, Maximilian Michels <mxm@apache.org> wrote:

> Hi Flavio,
>
> I was working on this some time ago but it didn't make it in yet and
> priorities shifted a bit. The pull request is here:
> https://github.com/apache/flink/pull/640
>
> The basic idea is to remove Flink's ResultPartition buffers in memory
> lazily, i.e. keep them as long as enough memory is available. When a
> new job is resumed, it picks up the old results again. The pull
> request needs some overhaul now and the API integration is not there
> yet.
>
> Cheers,
> Max
>
> On Mon, Nov 30, 2015 at 5:35 PM, Flavio Pompermaier
> <pompermaier@okkam.it> wrote:
> > I think that with some support I could try to implement it...actually I
> just
> > need to add a persist(StorageLevel.OFF_HEAP) method to the Dataset APIs
> > (similar to what Spark does..) and output it to a tachyon directory
> > configured in the flink-conf.yml and then re-read that dataset using its
> > generated name on tachyon. Do you have other suggestions?
> >
> >
> > On Mon, Nov 30, 2015 at 4:58 PM, Fabian Hueske <fhueske@gmail.com>
> wrote:
> >>
> >> The basic building blocks are there but I am not aware of any efforts to
> >> implement caching and add it to the API.
> >>
> >> 2015-11-30 16:55 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
> >>>
> >>> Is there any effort in this direction? maybe I could achieve something
> >>> like that using Tachyon in some way...?
> >>>
> >>> On Mon, Nov 30, 2015 at 4:52 PM, Fabian Hueske <fhueske@gmail.com>
> wrote:
> >>>>
> >>>> Hi Flavio,
> >>>>
> >>>> Flink does not support caching of data sets in memory yet.
> >>>>
> >>>> Best, Fabian
> >>>>
> >>>> 2015-11-30 16:45 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
> >>>>>
> >>>>> Hi to all,
> >>>>> I was wondering if Flink could fit a use case where a user load
a
> >>>>> dataset in memory and then he/she wants to explore it interactively.
> Let's
> >>>>> say I want to load a csv, then filter out the rows where the column
> value
> >>>>> match some criteria, then apply another criteria after seeing the
> results of
> >>>>> the first filter.
> >>>>> Is there a way to keep the dataset in memory and modify it
> >>>>> interactively without re-reading all the dataset every time I want
> to chain
> >>>>> another operation to my dataset?
> >>>>>
> >>>>> Best,
> >>>>> Flavio
> >>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
>

Mime
View raw message