flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishnu Viswanath <vishnu.viswanat...@gmail.com>
Subject Re: maximum size of window
Date Fri, 01 Jul 2016 00:53:06 GMT
Thank you!

On Wednesday, 29 June 2016, Aljoscha Krettek <aljoscha@apache.org> wrote:

> Hi,
> the result of splitting by key is that processing can easily be
> distributed among the workers because the windows for individual keys can
> be processed independently. This should improve cluster utilization.
>
> Cheers,
> Aljoscha
>
> On Tue, 28 Jun 2016 at 17:26 Vishnu Viswanath <
> vishnu.viswanath25@gmail.com
> <javascript:_e(%7B%7D,'cvml','vishnu.viswanath25@gmail.com');>> wrote:
>
>> Hi,
>>
>> Thank you for the responses.
>> I am not sure if I will be able to use Fold/Reduce function, but I will
>> keep that in mind.
>>
>> I have one more question, so what is the implication of having a key that
>> splits the data into window of very small size(=> large number of small
>> windows) ?
>>
>> Thanks and Regards,
>> Vishnu Viswanath,
>>
>> On Tue, Jun 28, 2016 at 5:04 AM, Aljoscha Krettek <aljoscha@apache.org
>> <javascript:_e(%7B%7D,'cvml','aljoscha@apache.org');>> wrote:
>>
>>> Hi,
>>> one thing to add: if you use a ReduceFunction or a FoldFunction for your
>>> window the state will not grow with bigger window sizes or larger numbers
>>> of elements because the result is eagerly computed. In that case, state
>>> size is only dependent on the number of individual keys.
>>>
>>> Cheers,
>>> Aljoscha
>>>
>>> On Tue, 28 Jun 2016 at 10:36 Kostas Kloudas <k.kloudas@data-artisans.com
>>> <javascript:_e(%7B%7D,'cvml','k.kloudas@data-artisans.com');>> wrote:
>>>
>>>> Hi Vishnu,
>>>>
>>>> RocksDB allows for storing the window contents on disk when the state
>>>> of a window becomes too big.
>>>> BUT when you have to trigger and apply the computation of your window
>>>> function on that big window,
>>>> then all of its state is loaded in memory.
>>>>
>>>> So although during the window formation phase, RocksDB allows you to
>>>> not  worry about storage space,
>>>> when it is time to fire your computation, then you have to consider how
>>>> much RAM you have and if the
>>>> window fits in it.
>>>>
>>>> Regards,
>>>> Kostas
>>>>
>>>>
>>>> On Jun 28, 2016, at 1:25 AM, Vishnu Viswanath <
>>>> vishnu.viswanath25@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','vishnu.viswanath25@gmail.com');>>
wrote:
>>>>
>>>> Hi Kostas,
>>>>
>>>> Thank you.
>>>> Yes 2) was exactly what I wanted to know.
>>>>
>>>> - So if I am using RocksDB as state backend, does that mean that I
>>>> don't have to worry much about the memory available per node since RocksDB
>>>> will use RAM and Disk to store the window state?
>>>>
>>>> Regards,
>>>> Vishnu
>>>>
>>>>
>>>> On Mon, Jun 27, 2016 at 6:19 PM, Kostas Kloudas <
>>>> k.kloudas@data-artisans.com
>>>> <javascript:_e(%7B%7D,'cvml','k.kloudas@data-artisans.com');>> wrote:
>>>>
>>>>> Hi Vishnu,
>>>>>
>>>>> I hope the following will help answer your question:
>>>>>
>>>>> 1) Elements are first split by key (apart from global windows) and
>>>>> then are put into windows. In other words, windows are keyed.
>>>>> 2) A window belonging to a certain key is handled by a single node. In
>>>>> other words, no matter how big the window is, its
>>>>>         state (the elements it contains) will never be split between
>>>>> two or more nodes.
>>>>> 3) Where the state is stored, depends on your state backend. Currently
>>>>> Flink supports an in-memory one, a filesystem one, and
>>>>>         a rocksDB one which is in the middle (first in-memory and then
>>>>> disk when needed). Of course you can implement your own.
>>>>>
>>>>> From the above, you can see that if you use the memory-backed state
>>>>> backend, then your window size is limited by the memory
>>>>> available at each of your nodes. If you use the fs state backend, then
>>>>> your state is stored on disk. Finally, rocksDB will initially
>>>>> use RAM and then spill on disk when no more memory is available.
>>>>>
>>>>> Here I have to add that the window documentation is currently being
>>>>> re-written to explain new features introduced in Flink 1.1,
>>>>> which include more flexible handling of late events and more explicit
>>>>> state garbage collection.
>>>>>
>>>>> So please stay tuned!
>>>>>
>>>>> I hope this helps at answering your question,
>>>>> Kostas
>>>>>
>>>>> > On Jun 27, 2016, at 8:22 PM, Vishnu Viswanath <
>>>>> vishnu.viswanath25@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','vishnu.viswanath25@gmail.com');>>
wrote:
>>>>> >
>>>>> > Hi All,
>>>>> >
>>>>> > - Is there any restriction on the size of a window in Flink with
>>>>> respect to the memory of the nodes?
>>>>> > - What happens if a window size grows more than size of a node,
will
>>>>> it be split into multiple nodes?
>>>>> >
>>>>> > if I am going to have a huge window, should I have fewer nodes with
>>>>> more memory.
>>>>> > Is there any documentation on how memory is managed/handled in the
>>>>> case of windows and also in the case of joins.
>>>>> >
>>>>> > Regards,
>>>>> > Vishnu
>>>>>
>>>>>
>>>>
>>>>
>>
>>

-- 
Thanks and Regards,
Vishnu Viswanath,
*www.vishnuviswanath.com <http://www.vishnuviswanath.com>*

Mime
View raw message