Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5CAAE200C25 for ; Fri, 24 Feb 2017 16:00:08 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5B4D6160B69; Fri, 24 Feb 2017 15:00:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A3858160B5C for ; Fri, 24 Feb 2017 16:00:07 +0100 (CET) Received: (qmail 77785 invoked by uid 500); 24 Feb 2017 15:00:06 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 77776 invoked by uid 99); 24 Feb 2017 15:00:06 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Feb 2017 15:00:06 +0000 Received: from mail-it0-f50.google.com (mail-it0-f50.google.com [209.85.214.50]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 6A13B1A00A2 for ; Fri, 24 Feb 2017 15:00:06 +0000 (UTC) Received: by mail-it0-f50.google.com with SMTP id 203so22938755ith.0 for ; Fri, 24 Feb 2017 07:00:06 -0800 (PST) X-Gm-Message-State: AMke39k73XywJfCpaWNleeu05cdEYNzfkTezUhRwvyn3unLvVmC+lMljfGn43ZUYMhG7kwXlxARHJmI0rl1wbQ== X-Received: by 10.36.185.16 with SMTP id w16mr2851555ite.118.1487948405731; Fri, 24 Feb 2017 07:00:05 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.8.198 with HTTP; Fri, 24 Feb 2017 07:00:05 -0800 (PST) In-Reply-To: <1487944701219-11882.post@n4.nabble.com> References: <1487753727469-11799.post@n4.nabble.com> <1487859615288-11831.post@n4.nabble.com> <1487941734848-11879.post@n4.nabble.com> <1487944701219-11882.post@n4.nabble.com> From: Stephan Ewen Date: Fri, 24 Feb 2017 16:00:05 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Checkpointing with RocksDB as statebackend To: user@flink.apache.org Content-Type: multipart/alternative; boundary=f403045d9a6e2b025c054947fc27 archived-at: Fri, 24 Feb 2017 15:00:08 -0000 --f403045d9a6e2b025c054947fc27 Content-Type: text/plain; charset=UTF-8 Hi Vinay! True, the operator state (like Kafka) is currently not asynchronously checkpointed. While it is rather small state, we have seen before that on S3 it can cause trouble, because S3 frequently stalls uploads of even data amounts as low as kilobytes due to its throttling policies. That would be a super important fix to add! Best, Stephan On Fri, Feb 24, 2017 at 2:58 PM, vinay patil wrote: > Hi, > > I have attached a snapshot for reference: > As you can see all the 3 checkpointins failed , for checkpoint ID 2 and 3 > it > is stuck at the Kafka source after 50% > (The data sent till now by Kafka source 1 is 65GB and sent by source 2 is > 15GB ) > > Within 10minutes 15M records were processed, and for the next 16minutes the > pipeline is stuck , I don't see any progress beyond 15M because of > checkpoints getting failed consistently. > > n4.nabble.com/file/n11882/Checkpointing_Failed.png> > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Re- > Checkpointing-with-RocksDB-as-statebackend-tp11752p11882.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. > --f403045d9a6e2b025c054947fc27 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Vinay!

True, the operator state (lik= e Kafka) is currently not asynchronously checkpointed.

=
While it is rather small state, we have seen before that on S3 it can = cause trouble, because S3 frequently stalls uploads of even data amounts as= low as kilobytes due to its throttling policies.

= That would be a super important fix to add!

Best,<= /div>
Stephan


On Fri, Feb 24, 2017 at 2:58 PM, vinay patil <v= inay18.patil@gmail.com> wrote:
Hi,

I have attached a snapshot for reference:
As you can see all the 3 checkpointins failed , for checkpoint ID 2 and 3 i= t
is stuck at the Kafka source after 50%
(The data sent till now by Kafka source 1 is 65GB and sent by source 2 is 15GB )

Within 10minutes 15M records were processed, and for the next 16minutes the=
pipeline is stuck , I don't see any progress beyond 15M because of
checkpoints getting failed consistently.

<http://apache-flink-user-mailing-list-archive.2336050.n4.= nabble.com/file/n11882/Checkpointing_Failed.png>



--
View this message in context: http://apache= -flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Ch= eckpointing-with-RocksDB-as-statebackend-tp11752p11882.html
Sent from the Apache Flink User Mai= ling List archive. mailing list archive at Nabble.com.

--f403045d9a6e2b025c054947fc27--