spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject spark git commit: [SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation about rate limiting and backpressure
Date Tue, 08 Sep 2015 21:54:57 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 7fd4674fc -> 63c72b93e


[SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation about rate limiting
and backpressure

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #8656 from tdas/SPARK-10492 and squashes the following commits:

986cdd6 [Tathagata Das] Added information on backpressure

(cherry picked from commit 52b24a602ad615a7f6aa427aefb1c7444c05d298)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/63c72b93
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/63c72b93
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/63c72b93

Branch: refs/heads/branch-1.5
Commit: 63c72b93eb51685814543a39caf9a6d221e2583c
Parents: 7fd4674
Author: Tathagata Das <tathagata.das1565@gmail.com>
Authored: Tue Sep 8 14:54:43 2015 -0700
Committer: Tathagata Das <tathagata.das1565@gmail.com>
Committed: Tue Sep 8 14:54:54 2015 -0700

----------------------------------------------------------------------
 docs/configuration.md               | 13 +++++++++++++
 docs/streaming-programming-guide.md | 13 ++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/63c72b93/docs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/configuration.md b/docs/configuration.md
index 77c5cbc..353efdb 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1438,6 +1438,19 @@ Apart from these, the following properties are also available, and
may be useful
 <table class="table">
 <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
 <tr>
+  <td><code>spark.streaming.backpressure.enabled</code></td>
+  <td>false</td>
+  <td>
+    Enables or disables Spark Streaming's internal backpressure mechanism (since 1.5).
+    This enables the Spark Streaming to control the receiving rate based on the 
+    current batch scheduling delays and processing times so that the system receives
+    only as fast as the system can process. Internally, this dynamically sets the 
+    maximum receiving rate of receivers. This rate is upper bounded by the values
+    `spark.streaming.receiver.maxRate` and `spark.streaming.kafka.maxRatePerPartition`
+    if they are set (see below).
+  </td>
+</tr>
+<tr>
   <td><code>spark.streaming.blockInterval</code></td>
   <td>200ms</td>
   <td>

http://git-wip-us.apache.org/repos/asf/spark/blob/63c72b93/docs/streaming-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index a1acf83..c751dbb 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -1807,7 +1807,7 @@ To run a Spark Streaming applications, you need to have the following.
     + *Mesos* - [Marathon](https://github.com/mesosphere/marathon) has been used to achieve
this
       with Mesos.
 
-- *[Since Spark 1.2] Configuring write ahead logs* - Since Spark 1.2,
+- *Configuring write ahead logs* - Since Spark 1.2,
   we have introduced _write ahead logs_ for achieving strong
   fault-tolerance guarantees. If enabled,  all the data received from a receiver gets written
into
   a write ahead log in the configuration checkpoint directory. This prevents data loss on
driver
@@ -1822,6 +1822,17 @@ To run a Spark Streaming applications, you need to have the following.
   stored in a replicated storage system. This can be done by setting the storage level for
the
   input stream to `StorageLevel.MEMORY_AND_DISK_SER`.
 
+- *Setting the max receiving rate* - If the cluster resources is not large enough for the
streaming
+  application to process data as fast as it is being received, the receivers can be rate
limited
+  by setting a maximum rate limit in terms of records / sec.
+  See the [configuration parameters](configuration.html#spark-streaming)
+  `spark.streaming.receiver.maxRate` for receivers and `spark.streaming.kafka.maxRatePerPartition`
+  for Direct Kafka approach. In Spark 1.5, we have introduced a feature called *backpressure*
that
+  eliminate the need to set this rate limit, as Spark Streaming automatically figures out
the
+  rate limits and dynamically adjusts them if the processing conditions change. This backpressure
+  can be enabled by setting the [configuration parameter](configuration.html#spark-streaming)
+  `spark.streaming.backpressure.enabled` to `true`.
+
 ### Upgrading Application Code
 {:.no_toc}
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message