spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vijayant soni (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-26086) Spark streaming max records per batch interval
Date Fri, 16 Nov 2018 07:33:00 GMT
vijayant soni created SPARK-26086:
-------------------------------------

             Summary: Spark streaming max records per batch interval
                 Key: SPARK-26086
                 URL: https://issues.apache.org/jira/browse/SPARK-26086
             Project: Spark
          Issue Type: Bug
          Components: DStreams
    Affects Versions: 2.3.1
            Reporter: vijayant soni


We have an Spark Streaming application that reads from Kinesis and writes to Redshift.

*Configuration*:

Number of receivers = 5

Batch interval = 10 mins

spark.streaming.receiver.maxRate = 2000 (records per second)

According to this config, the max records that can be read in a single batch can be calculated
using below formula:

{{Max records per batch = batch_interval * 60 (convert mins to seconds) * 5 (number of receivers)
* 2000 (max records per second per receiver) 10 * 60 * 5 * 2000 = 6,000,000 }}

But the actual number of records is more that the max number.

Batch I - 6,005,886 records

Batch II - 6,001,623 records

Batch III - 6,010,148 records

Please note that receivers are not even reading at the max rate, the records read per receiver
are near 1900 per second.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message