spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tdas <...@git.apache.org>
Subject [GitHub] spark pull request #16468: [SPARK-19074][SS][DOCS] Updated Structured Stream...
Date Thu, 05 Jan 2017 23:52:01 GMT
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16468#discussion_r94877350
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -954,49 +1014,93 @@ There are a few types of built-in output sinks.
     
     - **File sink** - Stores the output to a directory. 
     
    +{% highlight scala %}
    +writeStream
    +    .format("parquet")        // can be "orc", "json", "csv", etc.
    +    .option("path", "path/to/destination/dir")
    +    .start()
    +{% endhighlight %}
    +
     - **Foreach sink** - Runs arbitrary computation on the records in the output. See later
in the section for more details.
     
    +{% highlight scala %}
    +writeStream
    +    .foreach(...)
    +    .start()
    +{% endhighlight %}
    +
     - **Console sink (for debugging)** - Prints the output to the console/stdout every time
there is a trigger. Both, Append and Complete output modes, are supported. This should be
used for debugging purposes on low data volumes as the entire output is collected and stored
in the driver's memory after every trigger.
     
    -- **Memory sink (for debugging)** - The output is stored in memory as an in-memory table.
 Both, Append and Complete output modes, are supported. This should be used for debugging
purposes on low data volumes as the entire output is collected and stored in the driver's
memory after every trigger.
    +{% highlight scala %}
    +writeStream
    +    .format("console")
    +    .start()
    +{% endhighlight %}
    +
    +- **Memory sink (for debugging)** - The output is stored in memory as an in-memory table.
    +Both, Append and Complete output modes, are supported. This should be used for debugging
purposes
    +on low data volumes as the entire output is collected and stored in the driver's memory
after
    --- End diff --
    
    I kind of agree it repetitive, but i dont want people to miss this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message