spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From brk...@apache.org
Subject spark git commit: [DSTREAM][DOC] Add documentation for kinesis retry configurations
Date Thu, 18 May 2017 18:24:38 GMT
Repository: spark
Updated Branches:
  refs/heads/master 8fb3d5c6d -> 92580bd0e


[DSTREAM][DOC] Add documentation for kinesis retry configurations

## What changes were proposed in this pull request?

The changes were merged as part of - https://github.com/apache/spark/pull/17467.
The documentation was missed somewhere in the review iterations. Adding the documentation
where it belongs.

## How was this patch tested?
Docs. Not tested.

cc budde , brkyvz

Author: Yash Sharma <ysharma@atlassian.com>

Closes #18028 from yssharma/ysharma/kinesis_retry_docs.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/92580bd0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/92580bd0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/92580bd0

Branch: refs/heads/master
Commit: 92580bd0eae5dbf739573093cca1b12fd0c14049
Parents: 8fb3d5c
Author: Yash Sharma <ysharma@atlassian.com>
Authored: Thu May 18 11:24:33 2017 -0700
Committer: Burak Yavuz <brkyvz@gmail.com>
Committed: Thu May 18 11:24:33 2017 -0700

----------------------------------------------------------------------
 docs/streaming-kinesis-integration.md | 4 ++++
 1 file changed, 4 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/92580bd0/docs/streaming-kinesis-integration.md
----------------------------------------------------------------------
diff --git a/docs/streaming-kinesis-integration.md b/docs/streaming-kinesis-integration.md
index 6be0b54..9709bd3 100644
--- a/docs/streaming-kinesis-integration.md
+++ b/docs/streaming-kinesis-integration.md
@@ -216,3 +216,7 @@ de-aggregate records during consumption.
 - If no Kinesis checkpoint info exists when the input DStream starts, it will start either
from the oldest record available (`InitialPositionInStream.TRIM_HORIZON`) or from the latest
tip (`InitialPositionInStream.LATEST`).  This is configurable.
   - `InitialPositionInStream.LATEST` could lead to missed records if data is added to the
stream while no input DStreams are running (and no checkpoint info is being stored).
   - `InitialPositionInStream.TRIM_HORIZON` may lead to duplicate processing of records where
the impact is dependent on checkpoint frequency and processing idempotency.
+
+#### Kinesis retry configuration
+ - `spark.streaming.kinesis.retry.waitTime` : Wait time between Kinesis retries as a duration
string. When reading from Amazon Kinesis, users may hit `ProvisionedThroughputExceededException`'s,
when consuming faster than 5 transactions/second or, exceeding the maximum read rate of 2
MB/second. This configuration can be tweaked to increase the sleep between fetches when a
fetch fails to reduce these exceptions. Default is "100ms".
+ - `spark.streaming.kinesis.retry.maxAttempts` : Max number of retries for Kinesis fetches.
This config can also be used to tackle the Kinesis `ProvisionedThroughputExceededException`'s
in scenarios mentioned above. It can be increased to have more number of retries for Kinesis
reads. Default is 3.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message