flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Iterative queries on Flink
Date Mon, 10 Oct 2016 13:15:42 GMT
There is still quite a bit needed to do this properly:
  (1) incremental recovery
  (2) network stack caching

(1) will probably happen quite soon, I am not aware of any committer having
concrete plans for (2).

Best,
Stephan


On Sat, Oct 8, 2016 at 4:41 PM, Flavio Pompermaier <pompermaier@okkam.it>
wrote:

> Any progress in this direction?how mich effort do you think it's required
> in order to implement this feature?
>
> On 2 Dec 2015 16:29, "Flavio Pompermaier" <pompermaier@okkam.it> wrote:
>
>> Do you think it is possible to push ahead this thing? I need to implement
>> this interactive feature of Datasets. Do you think it is possible to
>> implement the persist() method in Flink (similar to Spark)? If you want I
>> can work on it with some instructions..
>>
>> On Wed, Dec 2, 2015 at 3:05 PM, Maximilian Michels <mxm@apache.org>
>> wrote:
>>
>>> Hi Flavio,
>>>
>>> I was working on this some time ago but it didn't make it in yet and
>>> priorities shifted a bit. The pull request is here:
>>> https://github.com/apache/flink/pull/640
>>>
>>> The basic idea is to remove Flink's ResultPartition buffers in memory
>>> lazily, i.e. keep them as long as enough memory is available. When a
>>> new job is resumed, it picks up the old results again. The pull
>>> request needs some overhaul now and the API integration is not there
>>> yet.
>>>
>>> Cheers,
>>> Max
>>>
>>> On Mon, Nov 30, 2015 at 5:35 PM, Flavio Pompermaier
>>> <pompermaier@okkam.it> wrote:
>>> > I think that with some support I could try to implement it...actually
>>> I just
>>> > need to add a persist(StorageLevel.OFF_HEAP) method to the Dataset APIs
>>> > (similar to what Spark does..) and output it to a tachyon directory
>>> > configured in the flink-conf.yml and then re-read that dataset using
>>> its
>>> > generated name on tachyon. Do you have other suggestions?
>>> >
>>> >
>>> > On Mon, Nov 30, 2015 at 4:58 PM, Fabian Hueske <fhueske@gmail.com>
>>> wrote:
>>> >>
>>> >> The basic building blocks are there but I am not aware of any efforts
>>> to
>>> >> implement caching and add it to the API.
>>> >>
>>> >> 2015-11-30 16:55 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>> >>>
>>> >>> Is there any effort in this direction? maybe I could achieve
>>> something
>>> >>> like that using Tachyon in some way...?
>>> >>>
>>> >>> On Mon, Nov 30, 2015 at 4:52 PM, Fabian Hueske <fhueske@gmail.com>
>>> wrote:
>>> >>>>
>>> >>>> Hi Flavio,
>>> >>>>
>>> >>>> Flink does not support caching of data sets in memory yet.
>>> >>>>
>>> >>>> Best, Fabian
>>> >>>>
>>> >>>> 2015-11-30 16:45 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it
>>> >:
>>> >>>>>
>>> >>>>> Hi to all,
>>> >>>>> I was wondering if Flink could fit a use case where a user
load a
>>> >>>>> dataset in memory and then he/she wants to explore it
>>> interactively. Let's
>>> >>>>> say I want to load a csv, then filter out the rows where
the
>>> column value
>>> >>>>> match some criteria, then apply another criteria after seeing
the
>>> results of
>>> >>>>> the first filter.
>>> >>>>> Is there a way to keep the dataset in memory and modify
it
>>> >>>>> interactively without re-reading all the dataset every time
I want
>>> to chain
>>> >>>>> another operation to my dataset?
>>> >>>>>
>>> >>>>> Best,
>>> >>>>> Flavio
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>
>>> >
>>> >
>>>
>>
>>

Mime
View raw message