flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Iterative queries on Flink
Date Sat, 08 Oct 2016 14:41:29 GMT
Any progress in this direction?how mich effort do you think it's required
in order to implement this feature?

On 2 Dec 2015 16:29, "Flavio Pompermaier" <pompermaier@okkam.it> wrote:

> Do you think it is possible to push ahead this thing? I need to implement
> this interactive feature of Datasets. Do you think it is possible to
> implement the persist() method in Flink (similar to Spark)? If you want I
> can work on it with some instructions..
>
> On Wed, Dec 2, 2015 at 3:05 PM, Maximilian Michels <mxm@apache.org> wrote:
>
>> Hi Flavio,
>>
>> I was working on this some time ago but it didn't make it in yet and
>> priorities shifted a bit. The pull request is here:
>> https://github.com/apache/flink/pull/640
>>
>> The basic idea is to remove Flink's ResultPartition buffers in memory
>> lazily, i.e. keep them as long as enough memory is available. When a
>> new job is resumed, it picks up the old results again. The pull
>> request needs some overhaul now and the API integration is not there
>> yet.
>>
>> Cheers,
>> Max
>>
>> On Mon, Nov 30, 2015 at 5:35 PM, Flavio Pompermaier
>> <pompermaier@okkam.it> wrote:
>> > I think that with some support I could try to implement it...actually I
>> just
>> > need to add a persist(StorageLevel.OFF_HEAP) method to the Dataset APIs
>> > (similar to what Spark does..) and output it to a tachyon directory
>> > configured in the flink-conf.yml and then re-read that dataset using its
>> > generated name on tachyon. Do you have other suggestions?
>> >
>> >
>> > On Mon, Nov 30, 2015 at 4:58 PM, Fabian Hueske <fhueske@gmail.com>
>> wrote:
>> >>
>> >> The basic building blocks are there but I am not aware of any efforts
>> to
>> >> implement caching and add it to the API.
>> >>
>> >> 2015-11-30 16:55 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>> >>>
>> >>> Is there any effort in this direction? maybe I could achieve something
>> >>> like that using Tachyon in some way...?
>> >>>
>> >>> On Mon, Nov 30, 2015 at 4:52 PM, Fabian Hueske <fhueske@gmail.com>
>> wrote:
>> >>>>
>> >>>> Hi Flavio,
>> >>>>
>> >>>> Flink does not support caching of data sets in memory yet.
>> >>>>
>> >>>> Best, Fabian
>> >>>>
>> >>>> 2015-11-30 16:45 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it
>> >:
>> >>>>>
>> >>>>> Hi to all,
>> >>>>> I was wondering if Flink could fit a use case where a user load
a
>> >>>>> dataset in memory and then he/she wants to explore it
>> interactively. Let's
>> >>>>> say I want to load a csv, then filter out the rows where the
column
>> value
>> >>>>> match some criteria, then apply another criteria after seeing
the
>> results of
>> >>>>> the first filter.
>> >>>>> Is there a way to keep the dataset in memory and modify it
>> >>>>> interactively without re-reading all the dataset every time
I want
>> to chain
>> >>>>> another operation to my dataset?
>> >>>>>
>> >>>>> Best,
>> >>>>> Flavio
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>> >
>>
>
>

Mime
View raw message