flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: Iterative queries on Flink
Date Wed, 02 Dec 2015 14:05:33 GMT
Hi Flavio,

I was working on this some time ago but it didn't make it in yet and
priorities shifted a bit. The pull request is here:
https://github.com/apache/flink/pull/640

The basic idea is to remove Flink's ResultPartition buffers in memory
lazily, i.e. keep them as long as enough memory is available. When a
new job is resumed, it picks up the old results again. The pull
request needs some overhaul now and the API integration is not there
yet.

Cheers,
Max

On Mon, Nov 30, 2015 at 5:35 PM, Flavio Pompermaier
<pompermaier@okkam.it> wrote:
> I think that with some support I could try to implement it...actually I just
> need to add a persist(StorageLevel.OFF_HEAP) method to the Dataset APIs
> (similar to what Spark does..) and output it to a tachyon directory
> configured in the flink-conf.yml and then re-read that dataset using its
> generated name on tachyon. Do you have other suggestions?
>
>
> On Mon, Nov 30, 2015 at 4:58 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>>
>> The basic building blocks are there but I am not aware of any efforts to
>> implement caching and add it to the API.
>>
>> 2015-11-30 16:55 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>>
>>> Is there any effort in this direction? maybe I could achieve something
>>> like that using Tachyon in some way...?
>>>
>>> On Mon, Nov 30, 2015 at 4:52 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>>>>
>>>> Hi Flavio,
>>>>
>>>> Flink does not support caching of data sets in memory yet.
>>>>
>>>> Best, Fabian
>>>>
>>>> 2015-11-30 16:45 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>>>>
>>>>> Hi to all,
>>>>> I was wondering if Flink could fit a use case where a user load a
>>>>> dataset in memory and then he/she wants to explore it interactively.
Let's
>>>>> say I want to load a csv, then filter out the rows where the column value
>>>>> match some criteria, then apply another criteria after seeing the results
of
>>>>> the first filter.
>>>>> Is there a way to keep the dataset in memory and modify it
>>>>> interactively without re-reading all the dataset every time I want to
chain
>>>>> another operation to my dataset?
>>>>>
>>>>> Best,
>>>>> Flavio
>>>>
>>>>
>>>
>>>
>>
>
>

Mime
View raw message