spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zsxw...@apache.org
Subject spark git commit: [SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table
Date Mon, 19 Jun 2017 17:59:08 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-2.2 f7fcdec6c -> 7b50736c4


[SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table

## What changes were proposed in this pull request?

The description for several options of File Source for structured streaming appeared in the
File Sink description instead.

This pull request has two commits: The first includes changes to the version as it appeared
in spark 2.1 and the second handled an additional option added for spark 2.2

## How was this patch tested?

Built the documentation by SKIP_API=1 jekyll build and visually inspected the structured streaming
programming guide.

The original documentation was written by tdas and lw-lin

Author: assafmendelson <assaf.mendelson@gmail.com>

Closes #18342 from assafmendelson/spark-21123.

(cherry picked from commit 66a792cd88c63cc0a1d20cbe14ac5699afbb3662)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7b50736c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7b50736c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7b50736c

Branch: refs/heads/branch-2.2
Commit: 7b50736c45f379841466ae5730323b153313f400
Parents: f7fcdec
Author: assafmendelson <assaf.mendelson@gmail.com>
Authored: Mon Jun 19 10:58:58 2017 -0700
Committer: Shixiong Zhu <shixiong@databricks.com>
Committed: Mon Jun 19 10:59:06 2017 -0700

----------------------------------------------------------------------
 docs/structured-streaming-programming-guide.md | 28 +++++++++++----------
 1 file changed, 15 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/7b50736c/docs/structured-streaming-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md
index 9b9177d..d478042 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -510,7 +510,20 @@ Here are the details of all the sources in Spark.
     <td><b>File source</b></td>
     <td>
         <code>path</code>: path to the input directory, and common to all file
formats.
-        <br/><br/>
+        <br/>
+        <code>maxFilesPerTrigger</code>: maximum number of new files to be considered
in every trigger (default: no max)
+        <br/>
+        <code>latestFirst</code>: whether to processs the latest new files first,
useful when there is a large backlog of files (default: false)
+        <br/>
+        <code>fileNameOnly</code>: whether to check new files based on only the
filename instead of on the full path (default: false). With this set to `true`, the following
files would be considered as the same file, because their filenames, "dataset.txt", are the
same:
+        <br/>
+        · "file:///dataset.txt"<br/>
+        · "s3://a/dataset.txt"<br/>
+        · "s3n://a/b/dataset.txt"<br/>
+        · "s3a://a/b/c/dataset.txt"<br/>
+        <br/>
+
+        <br/>
         For file-format-specific options, see the related methods in <code>DataStreamReader</code>
         (<a href="api/scala/index.html#org.apache.spark.sql.streaming.DataStreamReader">Scala</a>/<a
href="api/java/org/apache/spark/sql/streaming/DataStreamReader.html">Java</a>/<a
href="api/python/pyspark.sql.html#pyspark.sql.streaming.DataStreamReader">Python</a>/<a
         href="api/R/read.stream.html">R</a>).
@@ -1234,18 +1247,7 @@ Here are the details of all the sinks in Spark.
     <td>Append</td>
     <td>
         <code>path</code>: path to the output directory, must be specified.
-        <br/>
-        <code>maxFilesPerTrigger</code>: maximum number of new files to be considered
in every trigger (default: no max)
-        <br/>
-        <code>latestFirst</code>: whether to processs the latest new files first,
useful when there is a large backlog of files (default: false)
-        <br/>
-        <code>fileNameOnly</code>: whether to check new files based on only the
filename instead of on the full path (default: false). With this set to `true`, the following
files would be considered as the same file, because their filenames, "dataset.txt", are the
same:
-        <br/>
-        · "file:///dataset.txt"<br/>
-        · "s3://a/dataset.txt"<br/>
-        · "s3n://a/b/dataset.txt"<br/>
-        · "s3a://a/b/c/dataset.txt"<br/>
-        <br/>
+        <br/><br/>
         For file-format-specific options, see the related methods in DataFrameWriter
         (<a href="api/scala/index.html#org.apache.spark.sql.DataFrameWriter">Scala</a>/<a
href="api/java/org/apache/spark/sql/DataFrameWriter.html">Java</a>/<a href="api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter">Python</a>/<a
         href="api/R/write.stream.html">R</a>).


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message