spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ma...@apache.org
Subject git commit: SPARK-4040. Update documentation to exemplify use of local (n) value, fo...
Date Wed, 05 Nov 2014 23:45:47 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-1.2 cf2f676f9 -> fe4ead299


SPARK-4040. Update documentation to exemplify use of local (n) value, fo...

This is a minor docs update which helps to clarify the way local[n] is used for streaming
apps.

Author: jay@apache.org <jayunit100>

Closes #2964 from jayunit100/SPARK-4040 and squashes the following commits:

35b5a5e [jay@apache.org] SPARK-4040: Update documentation to exemplify use of local (n) value.

(cherry picked from commit 868cd4c3ca11e6ecc4425b972d9a20c360b52425)
Signed-off-by: Matei Zaharia <matei@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fe4ead29
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fe4ead29
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fe4ead29

Branch: refs/heads/branch-1.2
Commit: fe4ead2995ab8529602090ed21941b6005a07c9d
Parents: cf2f676
Author: jay@apache.org <jayunit100>
Authored: Wed Nov 5 15:45:34 2014 -0800
Committer: Matei Zaharia <matei@databricks.com>
Committed: Wed Nov 5 15:45:43 2014 -0800

----------------------------------------------------------------------
 docs/configuration.md               | 10 ++++++++--
 docs/streaming-programming-guide.md | 14 +++++++++-----
 2 files changed, 17 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/fe4ead29/docs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/configuration.md b/docs/configuration.md
index 685101e..0f9eb81 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -21,16 +21,22 @@ application. These properties can be set directly on a
 [SparkConf](api/scala/index.html#org.apache.spark.SparkConf) passed to your
 `SparkContext`. `SparkConf` allows you to configure some of the common properties
 (e.g. master URL and application name), as well as arbitrary key-value pairs through the
-`set()` method. For example, we could initialize an application as follows:
+`set()` method. For example, we could initialize an application with two threads as follows:
+
+Note that we run with local[2], meaning two threads - which represents "minimal" parallelism,

+which can help detect bugs that only exist when we run in a distributed context. 
 
 {% highlight scala %}
 val conf = new SparkConf()
-             .setMaster("local")
+             .setMaster("local[2]")
              .setAppName("CountingSheep")
              .set("spark.executor.memory", "1g")
 val sc = new SparkContext(conf)
 {% endhighlight %}
 
+Note that we can have more than 1 thread in local mode, and in cases like spark streaming,
we may actually
+require one to prevent any sort of starvation issues.  
+
 ## Dynamically Loading Spark Properties
 In some cases, you may want to avoid hard-coding certain configurations in a `SparkConf`.
For
 instance, if you'd like to run the same application with different masters or different

http://git-wip-us.apache.org/repos/asf/spark/blob/fe4ead29/docs/streaming-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index 8bbba88..44a1f3a 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -68,7 +68,9 @@ import org.apache.spark._
 import org.apache.spark.streaming._
 import org.apache.spark.streaming.StreamingContext._
 
-// Create a local StreamingContext with two working thread and batch interval of 1 second
+// Create a local StreamingContext with two working thread and batch interval of 1 second.
+// The master requires 2 cores to prevent from a starvation scenario.
+
 val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
 val ssc = new StreamingContext(conf, Seconds(1))
 {% endhighlight %}
@@ -586,11 +588,13 @@ Every input DStream (except file stream) is associated with a single
[Receiver](
 
 A receiver is run within a Spark worker/executor as a long-running task, hence it occupies
one of the cores allocated to the Spark Streaming application. Hence, it is important to remember
that Spark Streaming application needs to be allocated enough cores to process the received
data, as well as, to run the receiver(s). Therefore, few important points to remember are:
 
-##### Points to remember:
+##### Points to remember
 {:.no_toc}
-- If the number of cores allocated to the application is less than or equal to the number
of input DStreams / receivers, then the system will receive data, but not be able to process
them.
-- When running locally, if you master URL is set to "local", then there is only one core
to run tasks.  That is insufficient for programs with even one input DStream (file streams
are okay) as the receiver will occupy that core and there will be no core left to process
the data.
-
+- If the number of threads allocated to the application is less than or equal to the number
of input DStreams / receivers, then the system will receive data, but not be able to process
them.
+- When running locally, if you master URL is set to "local", then there is only one core
to run tasks.  That is insufficient for programs using a DStream as the receiver (file streams
are okay).  So, a "local" master URL in a streaming app is generally going to cause starvation
for the processor.  
+Thus in any streaming app, you generally will want to allocate more than one thread (i.e.
set your master to "local[2]") when testing locally.
+See [Spark Properties] (configuration.html#spark-properties.html).
+  
 ### Basic Sources
 {:.no_toc}
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message