Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB55E18504 for ; Thu, 7 Jan 2016 10:17:40 +0000 (UTC) Received: (qmail 47718 invoked by uid 500); 7 Jan 2016 10:17:40 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 47642 invoked by uid 500); 7 Jan 2016 10:17:40 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 47338 invoked by uid 99); 7 Jan 2016 10:17:40 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jan 2016 10:17:40 +0000 Received: from mail-wm0-f49.google.com (mail-wm0-f49.google.com [74.125.82.49]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id ACA981A0015 for ; Thu, 7 Jan 2016 10:17:39 +0000 (UTC) Received: by mail-wm0-f49.google.com with SMTP id u188so92020765wmu.1 for ; Thu, 07 Jan 2016 02:17:39 -0800 (PST) X-Received: by 10.28.92.17 with SMTP id q17mr15360087wmb.40.1452161858178; Thu, 07 Jan 2016 02:17:38 -0800 (PST) MIME-Version: 1.0 References: <3C124EC5-4530-4C34-A0AC-0F19976D19A7@apache.org> In-Reply-To: <3C124EC5-4530-4C34-A0AC-0F19976D19A7@apache.org> From: =?UTF-8?Q?Gyula_F=C3=B3ra?= Date: Thu, 07 Jan 2016 10:17:28 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [DISCUSS] Refactor StateBackend into Partitioned State and Non-Partitioned State Backends To: dev@flink.apache.org Content-Type: multipart/alternative; boundary=001a1146df06b694b70528bbc729 --001a1146df06b694b70528bbc729 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, +1 I think it would be a good idea to separate the 2 state backends. I think you are right in most cases the new partitioned state implementations will benefit from this as it removes a lot of additional overhead (although sometimes it's nice to have the 2 together, for instance if they both use the filesystem) :) Cheers, Gyula Aljoscha Krettek ezt =C3=ADrta (id=C5=91pont: 2016. j= an. 7., Cs, 11:02): > Hi, > I=E2=80=99m currently examining ways to 1) change the window operators to= use the > partitioned state abstraction for window state and 2) implement state > backends for managed memory/out-of-core state. > > I think it would be helpful to pull the state backend apart. Right now, > for example, the DbStateBackend has a custom way of specifying another > state backend that should be used for non-partitioned state since a data > base really only makes sense for partitioned state. I was thinking about > adding a state backend based on RocksDB, which would also only make sense > for partitioned state. Pulling the two ways of state apart would allow th= e > implementation to focus on the important parts and give the user > flexibility without requiring every state backend to implement this. > > What do you think about pulling the back ends apart? > > =E2=80=94 > Aljoscha > > P.S. I have a prototype WindowOperator on partitioned state that does not > regress in performance compared to the current WindowOperator. Also, I ha= ve > a prototype RocksDB state backend. Here, the performance is about 1/10th > compared to using the in-memory state backend (with 100.000 keys) but it > scales way better (with the in-memory state backend performance goes down > when increasing the number of keys while it stays constant with RocksDB). > This is quite nice since it allows to use the same windowing code while > exchanging the state backend based on the job requirements. --001a1146df06b694b70528bbc729--