apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venkatesh Kottapalli <venkat...@datatorrent.com>
Subject Re: checkpoint statistics
Date Mon, 26 Sep 2016 06:11:58 GMT
+1 for this feature. The size and time to checkpoint the state at operator level will help
in tuning and understanding the overheads if any.


-Venkatesh.

> On Sep 25, 2016, at 10:56 PM, Chinmay Kolhatkar <chinmay@datatorrent.com> wrote:
> 
> +1. very useful feature. We should also provide doc on how to use that
> information for tuning.
> 
> On Sun, Sep 25, 2016 at 11:27 PM, Thomas Weise <thomas.weise@gmail.com>
> wrote:
> 
>> +1 very useful during tuning and ongoing monitoring for cost of
>> checkpointing (both, serialization and io). Can also be used to identify
>> skew.
>> 
>> --
>> sent from mobile
>> On Sep 25, 2016 9:10 AM, "Munagala Ramanath" <ram@datatorrent.com> wrote:
>> 
>>> We've seen  cases where operator state continues to grow without bound
>>> either because
>>> the developer was unaware of the importance of keeping state small or
>>> because of some
>>> anomaly downstream. In such cases, the operators could get killed with an
>>> OOM exception because
>>> these checkpoints are building up in memory faster than they can be
>> written
>>> to disk.
>>> 
>>> These stats may be useful in such cases to identify the root cause of
>>> failure.
>>> 
>>> Ram
>>> 
>>> On Sun, Sep 25, 2016 at 7:39 AM, Sandesh Hegde <sandesh@datatorrent.com>
>>> wrote:
>>> 
>>>> Say it takes x MB size and y seconds to do the checkpoint. What does
>> the
>>>> user do with that information?
>>>> 
>>>> On Sun, Sep 25, 2016, 6:51 AM Tushar Gosavi <tushar@datatorrent.com>
>>>> wrote:
>>>> 
>>>>> +1
>>>>> 
>>>>> -Tushar
>>>>> 
>>>>> On Sun, Sep 25, 2016, 8:54 AM Sanjay Pujare <sanjay@datatorrent.com>
>>>>> wrote:
>>>>> 
>>>>>> +1
>>>>>> 
>>>>>> Sanjay
>>>>>> 
>>>>>> 
>>>>>> On Sun, Sep 25, 2016 at 7:06 AM, Devendra Tagare <
>>>>>> devendrat@datatorrent.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> +1
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Dev
>>>>>>> 
>>>>>>> On Sep 25, 2016 1:17 AM, "Pramod Immaneni" <
>> pramod@datatorrent.com
>>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>>> On Sep 24, 2016, at 10:01 AM, Vlad Rozov <
>>>> v.rozov@datatorrent.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> IMO, it may be useful to provide checkpoint statistics
for
>>>> example,
>>>>>>>> total size of checkpoint for particular window or average
size
>> of
>>>>>>>> checkpoints for a particular operator. Also, how long it
takes
>> to
>>>>> write
>>>>>>>> checkpoints to storage.
>>>>>>>>> 
>>>>>>>>> Thank you,
>>>>>>>>> 
>>>>>>>>> Vlad
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Mime
View raw message