Mailing-List: contact issues-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@flink.apache.org
Date: Thu, 14 Sep 2017 09:10:00 +0000 (UTC)
From: "ASF GitHub Bot (JIRA)" <jira@apache.org>
To: issues@flink.apache.org
Message-ID: <JIRA.13097452.1503613061000.110830.1505380200969@Atlassian.JIRA>
In-Reply-To: <JIRA.13097452.1503613061000@Atlassian.JIRA>
References: <JIRA.13097452.1503613061000@Atlassian.JIRA> <JIRA.13097452.1503613061402@jira-lw-us.apache.org>
Subject: [jira] [Commented] (FLINK-7508) switch FlinkKinesisProducer to use
 KPL's ThreadingMode to ThreadedPool mode rather than Per_Request mode
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Thu, 14 Sep 2017 09:10:10 -0000


    [ https://issues.apache.org/jira/browse/FLINK-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165976#comment-16165976 ] 

ASF GitHub Bot commented on FLINK-7508:
---------------------------------------

Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4656#discussion_r138837100
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java ---
    @@ -171,8 +171,9 @@ public void open(Configuration parameters) throws Exception {
     		super.open(parameters);
     
     		// check and pass the configuration properties
    -		KinesisProducerConfiguration producerConfig = KinesisConfigUtil.validateProducerConfiguration(configProps);
    +		KinesisProducerConfiguration producerConfig = KinesisConfigUtil.getValidatedProducerConfiguration(configProps);
     		producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
    +		producerConfig.setThreadingModel(KinesisProducerConfiguration.ThreadingModel.POOLED);
    --- End diff --
    
    Do you think it will make sense to allow the user to configure different threading models?


> switch FlinkKinesisProducer to use KPL's ThreadingMode to ThreadedPool mode rather than Per_Request mode
> --------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-7508
>                 URL: https://issues.apache.org/jira/browse/FLINK-7508
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kinesis Connector
>    Affects Versions: 1.3.0
>            Reporter: Bowen Li
>            Assignee: Bowen Li
>            Priority: Critical
>             Fix For: 1.4.0
>
>
> KinesisProducerLibrary (KPL) 0.10.x had been using a One-New-Thread-Per-Request model for all requests sent to AWS Kinesis, which is very expensive.
> 0.12.4 introduced a new [ThreadingMode - Pooled|https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer/src/main/java/com/amazonaws/services/kinesis/producer/KinesisProducerConfiguration.java#L225], which will use a thread pool. This hugely improves KPL's performance and reduces consumed resources. By default, KPL still uses per-request mode. We should explicitly switch FlinkKinesisProducer's KPL threading mode to 'Pooled'.
> This work depends on FLINK-7366 and FLINK-7508
> Benchmarking I did:
> * Environment: Running a Flink hourly-sliding windowing job on 18-node EMR cluster with R4.2xlarge instances. Each hourly-sliding window in the Flink job generates about 21million UserRecords, which means that we generated a test load of 21million UserRecords at the first minute of each hour.
> * Criteria: Test KPL throughput per minute. Since the default RecordTTL for KPL is 30 sec, we can be sure that either all UserRecords are sent by KPL within a minute, or we will see UserRecord expiration errors.
> * One-New-Thread-Per-Request model: max throughput is about 2million UserRecords per min; it doesn't go beyond that because CPU utilization goes to 100%, everything stopped working and that Flink job crashed.
> * Thread-Pool model with pool size of 10: it sends out 21million UserRecords within 30 sec without any UserRecord expiration errors. The average peak CPU utilization is about 20% - 30%. So 21million UserRecords/min is not the max throughput of thread-pool model. We didn't go any further because 1) this throughput is already a couple times more than what we really need, and 2) we don't have a quick way of increasing the test load
> Thus, I propose switching FlinkKinesisProducer to Thread-Pool mode. [~tzulitai] What do you think


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)