spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lonelytrooper <...@git.apache.org>
Subject [GitHub] spark pull request #19274: [SPARK-22056] Add subconcurrency for KafkaRDDPart...
Date Tue, 19 Sep 2017 08:29:21 GMT
GitHub user lonelytrooper opened a pull request:

    https://github.com/apache/spark/pull/19274

    [SPARK-22056] Add subconcurrency for KafkaRDDPartition

    JIRA Issue´╝Ühttps://issues.apache.org/jira/browse/SPARK-22056
    
    When spark streaming consuming data from Kafka in direct way , partition in Kafka and
KafkaRDDPartition in spark streaming are now bijection. To enhance the computing ability of
spark streaming, we always to increase the number of partitions in Kafka , but too many increments
may lead problems in Kafka like leader selection. 
    So , we introduce a new mechanism that change bijection to one-to-many which controls
by a new parameter named "topic.partition.subconcurrency". This mechanism will divide one
KafkaRDDPartition to many according to the parameter, thus will make spark streaming use computing
resources more efficient  and avoid the problems caused by increase the Kafka partitions.
 
    
    
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual
tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lonelytrooper/spark add_partition_concurrency

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19274
    
----
commit a89663411e568f265103f0b695168d4db68a2b36
Author: bjyfhanfei <yfhanfei@jd.com>
Date:   2017-09-04T09:00:25Z

    add partition subconcurrency

commit d1132195d6b2087be4f18ad25614836c46512fe7
Author: bjyfhanfei <yfhanfei@jd.com>
Date:   2017-09-19T06:12:29Z

    add topic.partition.subconcurrency

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message