flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hironori Ogibayashi <ogibaya...@gmail.com>
Subject Re: Handling large state (incremental snapshot?)
Date Tue, 05 Apr 2016 12:40:43 GMT
Aljoscha,

Thank you for your quick response.
Yes, I am using FsStateBackend, so I will try RocksDB backend.

Regards,
Hironori

2016-04-05 21:23 GMT+09:00 Aljoscha Krettek <aljoscha@apache.org>:
> Hi,
> I guess you are using the FsStateBackend, is that correct? You could try
> using the RocksDB state backend:
> https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/state_backends.html#the-rocksdbstatebackend
>
> With this, throughput will be lower but the overhead per checkpoint could be
> lower. Also, with this most of the file copying necessary for the checkpoint
> will be done while data processing keeps running (asynchronous snapshot).
>
> As to incremental snapshots. I'm afraid this feature is not yet implemented
> but we're working on it.
>
> Cheers,
> Aljoscha
>
> On Tue, 5 Apr 2016 at 14:06 Hironori Ogibayashi <ogibayashi@gmail.com>
> wrote:
>>
>> Hello,
>>
>> I am trying to implement windowed distinct count on a stream. In this
>> case, the state
>> have to hold all distinct value in the window, so can be large.
>>
>> In my test, if the state size become about 400MB, checkpointing takes
>> 40sec and spends most of Taskmanager's CPU.
>> Are there any good way to handle this situation?
>>
>> Flink document mentions about incremental snapshot, and I am interested in
>> it,
>> but could not find how to enable it. (not implemented yet?)
>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.0/internals/stream_checkpointing.html
>>
>> Regards,
>> Hironori

Mime
View raw message