spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From r...@apache.org
Subject spark git commit: [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad
Date Wed, 12 Oct 2016 07:40:47 GMT
Repository: spark
Updated Branches:
  refs/heads/master b512f04f8 -> c264ef9b1


[SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad

## What changes were proposed in this pull request?

Documentation fix to make it clear that reusing group id for different streams is super duper
bad, just like it is with the underlying Kafka consumer.

## How was this patch tested?

I built jekyll doc and made sure it looked ok.

Author: cody koeninger <cody@koeninger.org>

Closes #15442 from koeninger/SPARK-17853.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c264ef9b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c264ef9b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c264ef9b

Branch: refs/heads/master
Commit: c264ef9b1918256a5018c7a42a1a2b42308ea3f7
Parents: b512f04
Author: cody koeninger <cody@koeninger.org>
Authored: Wed Oct 12 00:40:47 2016 -0700
Committer: Reynold Xin <rxin@databricks.com>
Committed: Wed Oct 12 00:40:47 2016 -0700

----------------------------------------------------------------------
 docs/streaming-kafka-0-10-integration.md | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/c264ef9b/docs/streaming-kafka-0-10-integration.md
----------------------------------------------------------------------
diff --git a/docs/streaming-kafka-0-10-integration.md b/docs/streaming-kafka-0-10-integration.md
index 44c39e3..456b845 100644
--- a/docs/streaming-kafka-0-10-integration.md
+++ b/docs/streaming-kafka-0-10-integration.md
@@ -27,7 +27,7 @@ For Scala/Java applications using SBT/Maven project definitions, link your
strea
 	  "bootstrap.servers" -> "localhost:9092,anotherhost:9092",
 	  "key.deserializer" -> classOf[StringDeserializer],
 	  "value.deserializer" -> classOf[StringDeserializer],
-	  "group.id" -> "example",
+	  "group.id" -> "use_a_separate_group_id_for_each_stream",
 	  "auto.offset.reset" -> "latest",
 	  "enable.auto.commit" -> (false: java.lang.Boolean)
 	)
@@ -48,7 +48,7 @@ Each item in the stream is a [ConsumerRecord](http://kafka.apache.org/0100/javad
 </div>
 
 For possible kafkaParams, see [Kafka consumer config docs](http://kafka.apache.org/documentation.html#newconsumerconfigs).
-Note that enable.auto.commit is disabled, for discussion see [Storing Offsets](streaming-kafka-0-10-integration.html#storing-offsets)
below.
+Note that the example sets enable.auto.commit to false, for discussion see [Storing Offsets](streaming-kafka-0-10-integration.html#storing-offsets)
below.
 
 ### LocationStrategies
 The new Kafka consumer API will pre-fetch messages into buffers.  Therefore it is important
for performance reasons that the Spark integration keep cached consumers on executors (rather
than recreating them for each batch), and prefer to schedule partitions on the host locations
that have the appropriate consumers.
@@ -57,6 +57,9 @@ In most cases, you should use `LocationStrategies.PreferConsistent` as shown
abo
 
 The cache for consumers has a default maximum size of 64.  If you expect to be handling more
than (64 * number of executors) Kafka partitions, you can change this setting via `spark.streaming.kafka.consumer.cache.maxCapacity`
 
+The cache is keyed by topicpartition and group.id, so use a **separate** `group.id` for each
call to `createDirectStream`.
+
+
 ### ConsumerStrategies
 The new Kafka consumer API has a number of different ways to specify topics, some of which
require considerable post-object-instantiation setup.  `ConsumerStrategies` provides an abstraction
that allows Spark to obtain properly configured consumers even after restart from checkpoint.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message