flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-7223) Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector
Date Wed, 19 Jul 2017 16:35:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093375#comment-16093375
] 

Tzu-Li (Gordon) Tai edited comment on FLINK-7223 at 7/19/17 4:34 PM:
---------------------------------------------------------------------

This config affects only the responsiveness to new shards of subscribed streams, and not new
streams (unlike the current Kafka consumer in master, Kinesis consumer does not support stream
pattern subscription).
I agree that the proposed 5 minutes is a bit too much by default.

One other thing: the AWS limitation is "This operation has a limit of 10 transactions per
second per account." Therefore, IMO, even if we change this discovery interval to 5 minutes,
you would still bump into the limitation if the source parallelism is high (each source instance
performs discovery independently).

We could consider:
1. Be a little less strict when handling exceptions due to the limitation cap
2. Perhaps add a little more jitter into the discovery interval to even out the requests across
multiple subtasks.

My consideration is that simply increasing the discovery interval would not really solve the
problem of {{describeStream}} API rate limitations.


was (Author: tzulitai):
This config affects the responsiveness to new shards of the stream (unlike the current Kafka
consumer in master, Kinesis consumer does not support stream pattern subscription).
I agree that the proposed 5 minutes is a bit too much by default.

One other thing: the AWS limitation is "This operation has a limit of 10 transactions per
second per account." Therefore, IMO, even if we change this discovery interval to 5 minutes,
you would still bump into the limitation if the source parallelism is high (each source instance
performs discovery independently).

We could consider:
1. Be a little less strict when handling exceptions due to the limitation cap
2. Perhaps add a little more jitter into the discovery interval to even out the requests across
multiple subtasks.

My consideration is that simply increasing the discovery interval would not really solve the
problem of {{describeStream}} API rate limitations.

> Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-7223
>                 URL: https://issues.apache.org/jira/browse/FLINK-7223
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kinesis Connector
>    Affects Versions: 1.3.0
>            Reporter: Bowen Li
>            Assignee: Bowen Li
>            Priority: Minor
>             Fix For: 1.4.0
>
>
> Background: {{DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS}} in {{org.apache.flink.streaming.connectors.kinesis.config.ConsumerConfigConstants}}
is the default value for Flink to call Kinesis's {{describeStream()}} API.
> Problem: Right now, its value is 10,000millis (10sec), which is too short. We ran into
problems that Flink-kinesis-connector's call of {{describeStream()}} exceeds Kinesis rate
limit, and broken Flink taskmanager.
> According to http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html,

> "This operation has a limit of 10 transactions per second per account.". What it means
is that the 10transaction/account is a limit on a single organization's AWS account......:(
 We contacted AWS Support, and confirmed this. If you have more applications (either other
Flink apps or non-Flink apps) competing aggressively with your Flink app on this API, your
Flink app breaks. 
> I propose increasing the value DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS from 10,000millis(10sec)
to preferably 300,000 (5min). Or at least 60,000 (1min) if anyone has a solid reason arguing
that 5min is too long, 
> This is also related to https://issues.apache.org/jira/browse/FLINK-6365



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message