Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 564B0200C6A for ; Wed, 19 Apr 2017 09:43:24 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 54DC1160B94; Wed, 19 Apr 2017 07:43:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9D61B160B86 for ; Wed, 19 Apr 2017 09:43:23 +0200 (CEST) Received: (qmail 44964 invoked by uid 500); 19 Apr 2017 07:43:22 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 44955 invoked by uid 99); 19 Apr 2017 07:43:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Apr 2017 07:43:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 4C215185E94 for ; Wed, 19 Apr 2017 07:43:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.02 X-Spam-Level: X-Spam-Status: No, score=-4.02 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id rllXUfIROHrz for ; Wed, 19 Apr 2017 07:43:21 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id F2C945FC96 for ; Wed, 19 Apr 2017 07:43:20 +0000 (UTC) Received: (qmail 44857 invoked by uid 99); 19 Apr 2017 07:43:20 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Apr 2017 07:43:20 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id DCAC4DFBC8; Wed, 19 Apr 2017 07:43:19 +0000 (UTC) From: tony810430 To: issues@flink.incubator.apache.org Reply-To: issues@flink.incubator.apache.org References: In-Reply-To: Subject: [GitHub] flink pull request #3001: [FLINK-4821] [kinesis] Implement rescalable non-pa... Content-Type: text/plain Message-Id: <20170419074319.DCAC4DFBC8@git1-us-west.apache.org> Date: Wed, 19 Apr 2017 07:43:19 +0000 (UTC) archived-at: Wed, 19 Apr 2017 07:43:24 -0000 Github user tony810430 commented on a diff in the pull request: https://github.com/apache/flink/pull/3001#discussion_r112134112 --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisConsumer.java --- @@ -194,26 +216,30 @@ public void run(SourceContext sourceContext) throws Exception { // all subtasks will run a fetcher, regardless of whether or not the subtask will initially have // shards to subscribe to; fetchers will continuously poll for changes in the shard list, so all subtasks // can potentially have new shards to subscribe to later on - fetcher = new KinesisDataFetcher<>( - streams, sourceContext, getRuntimeContext(), configProps, deserializer); + fetcher = createFetcher(streams, sourceContext, getRuntimeContext(), configProps, deserializer); boolean isRestoringFromFailure = (sequenceNumsToRestore != null); fetcher.setIsRestoringFromFailure(isRestoringFromFailure); // if we are restoring from a checkpoint, we iterate over the restored // state and accordingly seed the fetcher with subscribed shards states if (isRestoringFromFailure) { - for (Map.Entry restored : lastStateSnapshot.entrySet()) { + List newShardsCreatedWhileNotRunning = fetcher.discoverNewShardsToSubscribe(); + for (KinesisStreamShard shard : newShardsCreatedWhileNotRunning) { fetcher.advanceLastDiscoveredShardOfStream( - restored.getKey().getStreamName(), restored.getKey().getShard().getShardId()); + shard.getStreamName(), shard.getShard().getShardId()); + + SequenceNumber startingStateForNewShard = lastStateSnapshot.containsKey(shard) + ? lastStateSnapshot.get(shard) + : SentinelSequenceNumber.SENTINEL_EARLIEST_SEQUENCE_NUM.get(); --- End diff -- I just refer the behavior when `KinesisDataFetcher` started consuming data from Kinesis. It always assign `SentinelSequenceNumber.SENTINEL_EARLIEST_SEQUENCE_NUM.get()` for newly discovered shards. It's okey to discuss which initial sequence number is proper for new shards here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---