Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D215C1969D for ; Thu, 7 Apr 2016 08:48:31 +0000 (UTC) Received: (qmail 65152 invoked by uid 500); 7 Apr 2016 08:48:31 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 65055 invoked by uid 500); 7 Apr 2016 08:48:31 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 64994 invoked by uid 99); 7 Apr 2016 08:48:31 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Apr 2016 08:48:31 +0000 Received: from mail-lf0-f45.google.com (mail-lf0-f45.google.com [209.85.215.45]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id E22691A01D7 for ; Thu, 7 Apr 2016 08:48:30 +0000 (UTC) Received: by mail-lf0-f45.google.com with SMTP id e190so50109984lfe.0 for ; Thu, 07 Apr 2016 01:48:30 -0700 (PDT) X-Gm-Message-State: AD7BkJKp7W700hIlCdAc+PCEM9p0NGjMOM5qlwudVlPVItS8I3AWYNntXFdoAZnL3N0DxDWrkRTM6lnGFUTQNg== X-Received: by 10.25.145.136 with SMTP id t130mr891871lfd.4.1460018909328; Thu, 07 Apr 2016 01:48:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Aljoscha Krettek Date: Thu, 07 Apr 2016 08:48:19 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: RocksDB state checkpointing is expensive? To: "user@flink.apache.org" Content-Type: multipart/alternative; boundary=001a114026d674bd5d052fe12428 --001a114026d674bd5d052fe12428 Content-Type: text/plain; charset=UTF-8 Hi, you are right. Currently there is no incremental checkpointing and therefore, at each checkpoint, we essentially copy the whole RocksDB database to HDFS (or whatever filesystem you chose as a backup location). As far as I know, Stephan will start working on adding support for incremental snapshots this week or next week. Cheers, Aljoscha On Thu, 7 Apr 2016 at 09:55 Krzysztof Zarzycki wrote: > Hi, > I saw the documentation and source code of the state management with > RocksDB and before I use it, I'm concerned of one thing: Am I right that > currently when state is being checkpointed, the whole RocksDB state is > snapshotted? There is no incremental, diff snapshotting, is it? If so, this > seems to be unfeasible for keeping state counted in tens or hundreds of GBs > (and you reach that size of a state, when you want to keep an embedded > state of the streaming application instead of going out to Cassandra/Hbase > or other DB). It will just cost too much to do snapshots of such large > state. > > Samza as a good example to compare, writes every state change to Kafka > topic, considering it a snapshot in the shape of changelog. Of course in > the moment of app restart, recovering the state from the changelog would be > too costly, that is why the changelog topic is compacted. Plus, I think > Samza does a state snapshot from time to time anyway (but I'm not sure of > that). > > Thanks for answering my doubts, > Krzysztof > > --001a114026d674bd5d052fe12428 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,
you are right. Currently there is no incremental c= heckpointing and therefore, at each checkpoint, we essentially copy the who= le RocksDB database to HDFS (or whatever filesystem you chose as a backup l= ocation). As far as I know, Stephan will start working on adding support fo= r incremental snapshots this week or next week.

Ch= eers,
Aljoscha

On Thu, 7 Apr 2016 at 09:55 Krzysztof Zarzycki <k.zarzycki@gmail.com> wrote:
Hi,=C2=A0
I saw the document= ation and source code of the state management with RocksDB and before I use= it, I'm concerned of one thing: Am I right that currently when state i= s being checkpointed, the whole RocksDB state is snapshotted? There is no i= ncremental, diff snapshotting, is it? If so, this seems to be unfeasible fo= r keeping state counted in tens or hundreds of GBs (and you reach that size= of a state, when you want to keep an embedded state of the streaming appli= cation instead of going out to Cassandra/Hbase or other DB). It will just c= ost too much to do snapshots of such large state.

= Samza as a good example to compare, writes every state change to Kafka topi= c, considering it a snapshot in the shape of changelog. Of course in the mo= ment of app restart, recovering the state from the changelog would be too c= ostly, that is why the changelog topic is compacted. Plus, I think Samza do= es a state snapshot from time to time anyway (but I'm not sure of that)= .

Thanks for answering my doubts,=C2=A0
Krzysztof

--001a114026d674bd5d052fe12428--