spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From van...@apache.org
Subject [39/51] [partial] spark-website git commit: Add docs for Spark 2.3.1.
Date Mon, 11 Jun 2018 18:16:54 GMT
http://git-wip-us.apache.org/repos/asf/spark-website/blob/7a017e0a/site/docs/2.3.1/api/R/spark.gbt.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/R/spark.gbt.html b/site/docs/2.3.1/api/R/spark.gbt.html
new file mode 100644
index 0000000..70a90d2
--- /dev/null
+++ b/site/docs/2.3.1/api/R/spark.gbt.html
@@ -0,0 +1,257 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Gradient Boosted Tree Model for Regression and Classification</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link rel="stylesheet" type="text/css" href="R.css" />
+
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
+<script>hljs.initHighlightingOnLoad();</script>
+</head><body>
+
+<table width="100%" summary="page for spark.gbt {SparkR}"><tr><td>spark.gbt {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
+
+<h2>Gradient Boosted Tree Model for Regression and Classification</h2>
+
+<h3>Description</h3>
+
+<p><code>spark.gbt</code> fits a Gradient Boosted Tree Regression model or Classification model on a
+SparkDataFrame. Users can call <code>summary</code> to get a summary of the fitted
+Gradient Boosted Tree model, <code>predict</code> to make predictions on new data, and
+<code>write.ml</code>/<code>read.ml</code> to save/load fitted models.
+For more details, see
+<a href="http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-regression">
+GBT Regression</a> and
+<a href="http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-classifier">
+GBT Classification</a>
+</p>
+
+
+<h3>Usage</h3>
+
+<pre>
+spark.gbt(data, formula, ...)
+
+## S4 method for signature 'SparkDataFrame,formula'
+spark.gbt(data, formula,
+  type = c("regression", "classification"), maxDepth = 5, maxBins = 32,
+  maxIter = 20, stepSize = 0.1, lossType = NULL, seed = NULL,
+  subsamplingRate = 1, minInstancesPerNode = 1, minInfoGain = 0,
+  checkpointInterval = 10, maxMemoryInMB = 256, cacheNodeIds = FALSE,
+  handleInvalid = c("error", "keep", "skip"))
+
+## S4 method for signature 'GBTRegressionModel'
+summary(object)
+
+## S3 method for class 'summary.GBTRegressionModel'
+print(x, ...)
+
+## S4 method for signature 'GBTClassificationModel'
+summary(object)
+
+## S3 method for class 'summary.GBTClassificationModel'
+print(x, ...)
+
+## S4 method for signature 'GBTRegressionModel'
+predict(object, newData)
+
+## S4 method for signature 'GBTClassificationModel'
+predict(object, newData)
+
+## S4 method for signature 'GBTRegressionModel,character'
+write.ml(object, path,
+  overwrite = FALSE)
+
+## S4 method for signature 'GBTClassificationModel,character'
+write.ml(object, path,
+  overwrite = FALSE)
+</pre>
+
+
+<h3>Arguments</h3>
+
+<table summary="R argblock">
+<tr valign="top"><td><code>data</code></td>
+<td>
+<p>a SparkDataFrame for training.</p>
+</td></tr>
+<tr valign="top"><td><code>formula</code></td>
+<td>
+<p>a symbolic description of the model to be fitted. Currently only a few formula
+operators are supported, including '~', ':', '+', and '-'.</p>
+</td></tr>
+<tr valign="top"><td><code>...</code></td>
+<td>
+<p>additional arguments passed to the method.</p>
+</td></tr>
+<tr valign="top"><td><code>type</code></td>
+<td>
+<p>type of model, one of &quot;regression&quot; or &quot;classification&quot;, to fit</p>
+</td></tr>
+<tr valign="top"><td><code>maxDepth</code></td>
+<td>
+<p>Maximum depth of the tree (&gt;= 0).</p>
+</td></tr>
+<tr valign="top"><td><code>maxBins</code></td>
+<td>
+<p>Maximum number of bins used for discretizing continuous features and for choosing
+how to split on features at each node. More bins give higher granularity. Must be
+&gt;= 2 and &gt;= number of categories in any categorical feature.</p>
+</td></tr>
+<tr valign="top"><td><code>maxIter</code></td>
+<td>
+<p>Param for maximum number of iterations (&gt;= 0).</p>
+</td></tr>
+<tr valign="top"><td><code>stepSize</code></td>
+<td>
+<p>Param for Step size to be used for each iteration of optimization.</p>
+</td></tr>
+<tr valign="top"><td><code>lossType</code></td>
+<td>
+<p>Loss function which GBT tries to minimize.
+For classification, must be &quot;logistic&quot;. For regression, must be one of
+&quot;squared&quot; (L2) and &quot;absolute&quot; (L1), default is &quot;squared&quot;.</p>
+</td></tr>
+<tr valign="top"><td><code>seed</code></td>
+<td>
+<p>integer seed for random number generation.</p>
+</td></tr>
+<tr valign="top"><td><code>subsamplingRate</code></td>
+<td>
+<p>Fraction of the training data used for learning each decision tree, in
+range (0, 1].</p>
+</td></tr>
+<tr valign="top"><td><code>minInstancesPerNode</code></td>
+<td>
+<p>Minimum number of instances each child must have after split. If a
+split causes the left or right child to have fewer than
+minInstancesPerNode, the split will be discarded as invalid. Should be
+&gt;= 1.</p>
+</td></tr>
+<tr valign="top"><td><code>minInfoGain</code></td>
+<td>
+<p>Minimum information gain for a split to be considered at a tree node.</p>
+</td></tr>
+<tr valign="top"><td><code>checkpointInterval</code></td>
+<td>
+<p>Param for set checkpoint interval (&gt;= 1) or disable checkpoint (-1).
+Note: this setting will be ignored if the checkpoint directory is not
+set.</p>
+</td></tr>
+<tr valign="top"><td><code>maxMemoryInMB</code></td>
+<td>
+<p>Maximum memory in MB allocated to histogram aggregation.</p>
+</td></tr>
+<tr valign="top"><td><code>cacheNodeIds</code></td>
+<td>
+<p>If FALSE, the algorithm will pass trees to executors to match instances with
+nodes. If TRUE, the algorithm will cache node IDs for each instance. Caching
+can speed up training of deeper trees. Users can set how often should the
+cache be checkpointed or disable it by setting checkpointInterval.</p>
+</td></tr>
+<tr valign="top"><td><code>handleInvalid</code></td>
+<td>
+<p>How to handle invalid data (unseen labels or NULL values) in features and
+label column of string type in classification model.
+Supported options: &quot;skip&quot; (filter out rows with invalid data),
+&quot;error&quot; (throw an error), &quot;keep&quot; (put invalid data in
+a special additional bucket, at index numLabels). Default
+is &quot;error&quot;.</p>
+</td></tr>
+<tr valign="top"><td><code>object</code></td>
+<td>
+<p>A fitted Gradient Boosted Tree regression model or classification model.</p>
+</td></tr>
+<tr valign="top"><td><code>x</code></td>
+<td>
+<p>summary object of Gradient Boosted Tree regression model or classification model
+returned by <code>summary</code>.</p>
+</td></tr>
+<tr valign="top"><td><code>newData</code></td>
+<td>
+<p>a SparkDataFrame for testing.</p>
+</td></tr>
+<tr valign="top"><td><code>path</code></td>
+<td>
+<p>The directory where the model is saved.</p>
+</td></tr>
+<tr valign="top"><td><code>overwrite</code></td>
+<td>
+<p>Overwrites or not if the output path already exists. Default is FALSE
+which means throw exception if the output path exists.</p>
+</td></tr>
+</table>
+
+
+<h3>Value</h3>
+
+<p><code>spark.gbt</code> returns a fitted Gradient Boosted Tree model.
+</p>
+<p><code>summary</code> returns summary information of the fitted model, which is a list.
+The list of components includes <code>formula</code> (formula),
+<code>numFeatures</code> (number of features), <code>features</code> (list of features),
+<code>featureImportances</code> (feature importances), <code>maxDepth</code> (max depth of trees),
+<code>numTrees</code> (number of trees), and <code>treeWeights</code> (tree weights).
+</p>
+<p><code>predict</code> returns a SparkDataFrame containing predicted labeled in a column named
+&quot;prediction&quot;.
+</p>
+
+
+<h3>Note</h3>
+
+<p>spark.gbt since 2.1.0
+</p>
+<p>summary(GBTRegressionModel) since 2.1.0
+</p>
+<p>print.summary.GBTRegressionModel since 2.1.0
+</p>
+<p>summary(GBTClassificationModel) since 2.1.0
+</p>
+<p>print.summary.GBTClassificationModel since 2.1.0
+</p>
+<p>predict(GBTRegressionModel) since 2.1.0
+</p>
+<p>predict(GBTClassificationModel) since 2.1.0
+</p>
+<p>write.ml(GBTRegressionModel, character) since 2.1.0
+</p>
+<p>write.ml(GBTClassificationModel, character) since 2.1.0
+</p>
+
+
+<h3>Examples</h3>
+
+<pre><code class="r">## Not run: 
+##D # fit a Gradient Boosted Tree Regression Model
+##D df &lt;- createDataFrame(longley)
+##D model &lt;- spark.gbt(df, Employed ~ ., type = &quot;regression&quot;, maxDepth = 5, maxBins = 16)
+##D 
+##D # get the summary of the model
+##D summary(model)
+##D 
+##D # make predictions
+##D predictions &lt;- predict(model, df)
+##D 
+##D # save and load the model
+##D path &lt;- &quot;path/to/model&quot;
+##D write.ml(model, path)
+##D savedModel &lt;- read.ml(path)
+##D summary(savedModel)
+##D 
+##D # fit a Gradient Boosted Tree Classification Model
+##D # label must be binary - Only binary classification is supported for GBT.
+##D t &lt;- as.data.frame(Titanic)
+##D df &lt;- createDataFrame(t)
+##D model &lt;- spark.gbt(df, Survived ~ Age + Freq, &quot;classification&quot;)
+##D 
+##D # numeric label is also supported
+##D t2 &lt;- as.data.frame(Titanic)
+##D t2$NumericGender &lt;- ifelse(t2$Sex == &quot;Male&quot;, 0, 1)
+##D df &lt;- createDataFrame(t2)
+##D model &lt;- spark.gbt(df, NumericGender ~ ., type = &quot;classification&quot;)
+## End(Not run)
+</code></pre>
+
+
+<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.3.1 <a href="00Index.html">Index</a>]</div>
+</body></html>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/7a017e0a/site/docs/2.3.1/api/R/spark.getSparkFiles.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/R/spark.getSparkFiles.html b/site/docs/2.3.1/api/R/spark.getSparkFiles.html
new file mode 100644
index 0000000..785ebe9
--- /dev/null
+++ b/site/docs/2.3.1/api/R/spark.getSparkFiles.html
@@ -0,0 +1,59 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Get the absolute path of a file added through spark.addFile.</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link rel="stylesheet" type="text/css" href="R.css" />
+
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
+<script>hljs.initHighlightingOnLoad();</script>
+</head><body>
+
+<table width="100%" summary="page for spark.getSparkFiles {SparkR}"><tr><td>spark.getSparkFiles {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
+
+<h2>Get the absolute path of a file added through spark.addFile.</h2>
+
+<h3>Description</h3>
+
+<p>Get the absolute path of a file added through spark.addFile.
+</p>
+
+
+<h3>Usage</h3>
+
+<pre>
+spark.getSparkFiles(fileName)
+</pre>
+
+
+<h3>Arguments</h3>
+
+<table summary="R argblock">
+<tr valign="top"><td><code>fileName</code></td>
+<td>
+<p>The name of the file added through spark.addFile</p>
+</td></tr>
+</table>
+
+
+<h3>Value</h3>
+
+<p>the absolute path of a file added through spark.addFile.
+</p>
+
+
+<h3>Note</h3>
+
+<p>spark.getSparkFiles since 2.1.0
+</p>
+
+
+<h3>Examples</h3>
+
+<pre><code class="r">## Not run: 
+##D spark.getSparkFiles(&quot;myfile&quot;)
+## End(Not run)
+</code></pre>
+
+
+<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.3.1 <a href="00Index.html">Index</a>]</div>
+</body></html>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/7a017e0a/site/docs/2.3.1/api/R/spark.getSparkFilesRootDirectory.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/R/spark.getSparkFilesRootDirectory.html b/site/docs/2.3.1/api/R/spark.getSparkFilesRootDirectory.html
new file mode 100644
index 0000000..9585cff
--- /dev/null
+++ b/site/docs/2.3.1/api/R/spark.getSparkFilesRootDirectory.html
@@ -0,0 +1,49 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Get the root directory that contains files added through...</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link rel="stylesheet" type="text/css" href="R.css" />
+
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
+<script>hljs.initHighlightingOnLoad();</script>
+</head><body>
+
+<table width="100%" summary="page for spark.getSparkFilesRootDirectory {SparkR}"><tr><td>spark.getSparkFilesRootDirectory {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
+
+<h2>Get the root directory that contains files added through spark.addFile.</h2>
+
+<h3>Description</h3>
+
+<p>Get the root directory that contains files added through spark.addFile.
+</p>
+
+
+<h3>Usage</h3>
+
+<pre>
+spark.getSparkFilesRootDirectory()
+</pre>
+
+
+<h3>Value</h3>
+
+<p>the root directory that contains files added through spark.addFile
+</p>
+
+
+<h3>Note</h3>
+
+<p>spark.getSparkFilesRootDirectory since 2.1.0
+</p>
+
+
+<h3>Examples</h3>
+
+<pre><code class="r">## Not run: 
+##D spark.getSparkFilesRootDirectory()
+## End(Not run)
+</code></pre>
+
+
+<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.3.1 <a href="00Index.html">Index</a>]</div>
+</body></html>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/7a017e0a/site/docs/2.3.1/api/R/spark.glm.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/R/spark.glm.html b/site/docs/2.3.1/api/R/spark.glm.html
new file mode 100644
index 0000000..2be0f81
--- /dev/null
+++ b/site/docs/2.3.1/api/R/spark.glm.html
@@ -0,0 +1,234 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Generalized Linear Models</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link rel="stylesheet" type="text/css" href="R.css" />
+
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
+<script>hljs.initHighlightingOnLoad();</script>
+</head><body>
+
+<table width="100%" summary="page for spark.glm {SparkR}"><tr><td>spark.glm {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
+
+<h2>Generalized Linear Models</h2>
+
+<h3>Description</h3>
+
+<p>Fits generalized linear model against a SparkDataFrame.
+Users can call <code>summary</code> to print a summary of the fitted model, <code>predict</code> to make
+predictions on new data, and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models.
+</p>
+
+
+<h3>Usage</h3>
+
+<pre>
+spark.glm(data, formula, ...)
+
+## S4 method for signature 'SparkDataFrame,formula'
+spark.glm(data, formula, family = gaussian,
+  tol = 1e-06, maxIter = 25, weightCol = NULL, regParam = 0,
+  var.power = 0, link.power = 1 - var.power,
+  stringIndexerOrderType = c("frequencyDesc", "frequencyAsc", "alphabetDesc",
+  "alphabetAsc"), offsetCol = NULL)
+
+## S4 method for signature 'GeneralizedLinearRegressionModel'
+summary(object)
+
+## S3 method for class 'summary.GeneralizedLinearRegressionModel'
+print(x, ...)
+
+## S4 method for signature 'GeneralizedLinearRegressionModel'
+predict(object, newData)
+
+## S4 method for signature 'GeneralizedLinearRegressionModel,character'
+write.ml(object, path,
+  overwrite = FALSE)
+</pre>
+
+
+<h3>Arguments</h3>
+
+<table summary="R argblock">
+<tr valign="top"><td><code>data</code></td>
+<td>
+<p>a SparkDataFrame for training.</p>
+</td></tr>
+<tr valign="top"><td><code>formula</code></td>
+<td>
+<p>a symbolic description of the model to be fitted. Currently only a few formula
+operators are supported, including '~', '.', ':', '+', and '-'.</p>
+</td></tr>
+<tr valign="top"><td><code>...</code></td>
+<td>
+<p>additional arguments passed to the method.</p>
+</td></tr>
+<tr valign="top"><td><code>family</code></td>
+<td>
+<p>a description of the error distribution and link function to be used in the model.
+This can be a character string naming a family function, a family function or
+the result of a call to a family function. Refer R family at
+<a href="https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html">https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html</a>.
+Currently these families are supported: <code>binomial</code>, <code>gaussian</code>,
+<code>Gamma</code>, <code>poisson</code> and <code>tweedie</code>.
+</p>
+<p>Note that there are two ways to specify the tweedie family.
+</p>
+
+<ul>
+<li><p> Set <code>family = "tweedie"</code> and specify the var.power and link.power;
+</p>
+</li>
+<li><p> When package <code>statmod</code> is loaded, the tweedie family is specified
+using the family definition therein, i.e., <code>tweedie(var.power, link.power)</code>.
+</p>
+</li></ul>
+</td></tr>
+<tr valign="top"><td><code>tol</code></td>
+<td>
+<p>positive convergence tolerance of iterations.</p>
+</td></tr>
+<tr valign="top"><td><code>maxIter</code></td>
+<td>
+<p>integer giving the maximal number of IRLS iterations.</p>
+</td></tr>
+<tr valign="top"><td><code>weightCol</code></td>
+<td>
+<p>the weight column name. If this is not set or <code>NULL</code>, we treat all instance
+weights as 1.0.</p>
+</td></tr>
+<tr valign="top"><td><code>regParam</code></td>
+<td>
+<p>regularization parameter for L2 regularization.</p>
+</td></tr>
+<tr valign="top"><td><code>var.power</code></td>
+<td>
+<p>the power in the variance function of the Tweedie distribution which provides
+the relationship between the variance and mean of the distribution. Only
+applicable to the Tweedie family.</p>
+</td></tr>
+<tr valign="top"><td><code>link.power</code></td>
+<td>
+<p>the index in the power link function. Only applicable to the Tweedie family.</p>
+</td></tr>
+<tr valign="top"><td><code>stringIndexerOrderType</code></td>
+<td>
+<p>how to order categories of a string feature column. This is used to
+decide the base level of a string feature as the last category
+after ordering is dropped when encoding strings. Supported options
+are &quot;frequencyDesc&quot;, &quot;frequencyAsc&quot;, &quot;alphabetDesc&quot;, and
+&quot;alphabetAsc&quot;. The default value is &quot;frequencyDesc&quot;. When the
+ordering is set to &quot;alphabetDesc&quot;, this drops the same category
+as R when encoding strings.</p>
+</td></tr>
+<tr valign="top"><td><code>offsetCol</code></td>
+<td>
+<p>the offset column name. If this is not set or empty, we treat all instance
+offsets as 0.0. The feature specified as offset has a constant coefficient of
+1.0.</p>
+</td></tr>
+<tr valign="top"><td><code>object</code></td>
+<td>
+<p>a fitted generalized linear model.</p>
+</td></tr>
+<tr valign="top"><td><code>x</code></td>
+<td>
+<p>summary object of fitted generalized linear model returned by <code>summary</code> function.</p>
+</td></tr>
+<tr valign="top"><td><code>newData</code></td>
+<td>
+<p>a SparkDataFrame for testing.</p>
+</td></tr>
+<tr valign="top"><td><code>path</code></td>
+<td>
+<p>the directory where the model is saved.</p>
+</td></tr>
+<tr valign="top"><td><code>overwrite</code></td>
+<td>
+<p>overwrites or not if the output path already exists. Default is FALSE
+which means throw exception if the output path exists.</p>
+</td></tr>
+</table>
+
+
+<h3>Value</h3>
+
+<p><code>spark.glm</code> returns a fitted generalized linear model.
+</p>
+<p><code>summary</code> returns summary information of the fitted model, which is a list.
+The list of components includes at least the <code>coefficients</code> (coefficients matrix,
+which includes coefficients, standard error of coefficients, t value and p value),
+<code>null.deviance</code> (null/residual degrees of freedom), <code>aic</code> (AIC)
+and <code>iter</code> (number of iterations IRLS takes). If there are collinear columns in
+the data, the coefficients matrix only provides coefficients.
+</p>
+<p><code>predict</code> returns a SparkDataFrame containing predicted labels in a column named
+&quot;prediction&quot;.
+</p>
+
+
+<h3>Note</h3>
+
+<p>spark.glm since 2.0.0
+</p>
+<p>summary(GeneralizedLinearRegressionModel) since 2.0.0
+</p>
+<p>print.summary.GeneralizedLinearRegressionModel since 2.0.0
+</p>
+<p>predict(GeneralizedLinearRegressionModel) since 1.5.0
+</p>
+<p>write.ml(GeneralizedLinearRegressionModel, character) since 2.0.0
+</p>
+
+
+<h3>See Also</h3>
+
+<p><a href="glm.html">glm</a>, <a href="read.ml.html">read.ml</a>
+</p>
+
+
+<h3>Examples</h3>
+
+<pre><code class="r">## Not run: 
+##D sparkR.session()
+##D t &lt;- as.data.frame(Titanic, stringsAsFactors = FALSE)
+##D df &lt;- createDataFrame(t)
+##D model &lt;- spark.glm(df, Freq ~ Sex + Age, family = &quot;gaussian&quot;)
+##D summary(model)
+##D 
+##D # fitted values on training data
+##D fitted &lt;- predict(model, df)
+##D head(select(fitted, &quot;Freq&quot;, &quot;prediction&quot;))
+##D 
+##D # save fitted model to input path
+##D path &lt;- &quot;path/to/model&quot;
+##D write.ml(model, path)
+##D 
+##D # can also read back the saved model and print
+##D savedModel &lt;- read.ml(path)
+##D summary(savedModel)
+##D 
+##D # note that the default string encoding is different from R&#39;s glm
+##D model2 &lt;- glm(Freq ~ Sex + Age, family = &quot;gaussian&quot;, data = t)
+##D summary(model2)
+##D # use stringIndexerOrderType = &quot;alphabetDesc&quot; to force string encoding
+##D # to be consistent with R
+##D model3 &lt;- spark.glm(df, Freq ~ Sex + Age, family = &quot;gaussian&quot;,
+##D                    stringIndexerOrderType = &quot;alphabetDesc&quot;)
+##D summary(model3)
+##D 
+##D # fit tweedie model
+##D model &lt;- spark.glm(df, Freq ~ Sex + Age, family = &quot;tweedie&quot;,
+##D                    var.power = 1.2, link.power = 0)
+##D summary(model)
+##D 
+##D # use the tweedie family from statmod
+##D library(statmod)
+##D model &lt;- spark.glm(df, Freq ~ Sex + Age, family = tweedie(1.2, 0))
+##D summary(model)
+## End(Not run)
+</code></pre>
+
+
+<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.3.1 <a href="00Index.html">Index</a>]</div>
+</body></html>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/7a017e0a/site/docs/2.3.1/api/R/spark.isoreg.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/R/spark.isoreg.html b/site/docs/2.3.1/api/R/spark.isoreg.html
new file mode 100644
index 0000000..0237709
--- /dev/null
+++ b/site/docs/2.3.1/api/R/spark.isoreg.html
@@ -0,0 +1,146 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Isotonic Regression Model</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link rel="stylesheet" type="text/css" href="R.css" />
+
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
+<script>hljs.initHighlightingOnLoad();</script>
+</head><body>
+
+<table width="100%" summary="page for spark.isoreg {SparkR}"><tr><td>spark.isoreg {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
+
+<h2>Isotonic Regression Model</h2>
+
+<h3>Description</h3>
+
+<p>Fits an Isotonic Regression model against a SparkDataFrame, similarly to R's isoreg().
+Users can print, make predictions on the produced model and save the model to the input path.
+</p>
+
+
+<h3>Usage</h3>
+
+<pre>
+spark.isoreg(data, formula, ...)
+
+## S4 method for signature 'SparkDataFrame,formula'
+spark.isoreg(data, formula,
+  isotonic = TRUE, featureIndex = 0, weightCol = NULL)
+
+## S4 method for signature 'IsotonicRegressionModel'
+summary(object)
+
+## S4 method for signature 'IsotonicRegressionModel'
+predict(object, newData)
+
+## S4 method for signature 'IsotonicRegressionModel,character'
+write.ml(object, path,
+  overwrite = FALSE)
+</pre>
+
+
+<h3>Arguments</h3>
+
+<table summary="R argblock">
+<tr valign="top"><td><code>data</code></td>
+<td>
+<p>SparkDataFrame for training.</p>
+</td></tr>
+<tr valign="top"><td><code>formula</code></td>
+<td>
+<p>A symbolic description of the model to be fitted. Currently only a few formula
+operators are supported, including '~', '.', ':', '+', and '-'.</p>
+</td></tr>
+<tr valign="top"><td><code>...</code></td>
+<td>
+<p>additional arguments passed to the method.</p>
+</td></tr>
+<tr valign="top"><td><code>isotonic</code></td>
+<td>
+<p>Whether the output sequence should be isotonic/increasing (TRUE) or
+antitonic/decreasing (FALSE).</p>
+</td></tr>
+<tr valign="top"><td><code>featureIndex</code></td>
+<td>
+<p>The index of the feature if <code>featuresCol</code> is a vector column
+(default: 0), no effect otherwise.</p>
+</td></tr>
+<tr valign="top"><td><code>weightCol</code></td>
+<td>
+<p>The weight column name.</p>
+</td></tr>
+<tr valign="top"><td><code>object</code></td>
+<td>
+<p>a fitted IsotonicRegressionModel.</p>
+</td></tr>
+<tr valign="top"><td><code>newData</code></td>
+<td>
+<p>SparkDataFrame for testing.</p>
+</td></tr>
+<tr valign="top"><td><code>path</code></td>
+<td>
+<p>The directory where the model is saved.</p>
+</td></tr>
+<tr valign="top"><td><code>overwrite</code></td>
+<td>
+<p>Overwrites or not if the output path already exists. Default is FALSE
+which means throw exception if the output path exists.</p>
+</td></tr>
+</table>
+
+
+<h3>Value</h3>
+
+<p><code>spark.isoreg</code> returns a fitted Isotonic Regression model.
+</p>
+<p><code>summary</code> returns summary information of the fitted model, which is a list.
+The list includes model's <code>boundaries</code> (boundaries in increasing order)
+and <code>predictions</code> (predictions associated with the boundaries at the same index).
+</p>
+<p><code>predict</code> returns a SparkDataFrame containing predicted values.
+</p>
+
+
+<h3>Note</h3>
+
+<p>spark.isoreg since 2.1.0
+</p>
+<p>summary(IsotonicRegressionModel) since 2.1.0
+</p>
+<p>predict(IsotonicRegressionModel) since 2.1.0
+</p>
+<p>write.ml(IsotonicRegression, character) since 2.1.0
+</p>
+
+
+<h3>Examples</h3>
+
+<pre><code class="r">## Not run: 
+##D sparkR.session()
+##D data &lt;- list(list(7.0, 0.0), list(5.0, 1.0), list(3.0, 2.0),
+##D         list(5.0, 3.0), list(1.0, 4.0))
+##D df &lt;- createDataFrame(data, c(&quot;label&quot;, &quot;feature&quot;))
+##D model &lt;- spark.isoreg(df, label ~ feature, isotonic = FALSE)
+##D # return model boundaries and prediction as lists
+##D result &lt;- summary(model, df)
+##D # prediction based on fitted model
+##D predict_data &lt;- list(list(-2.0), list(-1.0), list(0.5),
+##D                 list(0.75), list(1.0), list(2.0), list(9.0))
+##D predict_df &lt;- createDataFrame(predict_data, c(&quot;feature&quot;))
+##D # get prediction column
+##D predict_result &lt;- collect(select(predict(model, predict_df), &quot;prediction&quot;))
+##D 
+##D # save fitted model to input path
+##D path &lt;- &quot;path/to/model&quot;
+##D write.ml(model, path)
+##D 
+##D # can also read back the saved model and print
+##D savedModel &lt;- read.ml(path)
+##D summary(savedModel)
+## End(Not run)
+</code></pre>
+
+
+<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.3.1 <a href="00Index.html">Index</a>]</div>
+</body></html>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/7a017e0a/site/docs/2.3.1/api/R/spark.kmeans.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/R/spark.kmeans.html b/site/docs/2.3.1/api/R/spark.kmeans.html
new file mode 100644
index 0000000..4893ea0
--- /dev/null
+++ b/site/docs/2.3.1/api/R/spark.kmeans.html
@@ -0,0 +1,167 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: K-Means Clustering Model</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link rel="stylesheet" type="text/css" href="R.css" />
+
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
+<script>hljs.initHighlightingOnLoad();</script>
+</head><body>
+
+<table width="100%" summary="page for spark.kmeans {SparkR}"><tr><td>spark.kmeans {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
+
+<h2>K-Means Clustering Model</h2>
+
+<h3>Description</h3>
+
+<p>Fits a k-means clustering model against a SparkDataFrame, similarly to R's kmeans().
+Users can call <code>summary</code> to print a summary of the fitted model, <code>predict</code> to make
+predictions on new data, and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models.
+</p>
+
+
+<h3>Usage</h3>
+
+<pre>
+spark.kmeans(data, formula, ...)
+
+## S4 method for signature 'SparkDataFrame,formula'
+spark.kmeans(data, formula, k = 2,
+  maxIter = 20, initMode = c("k-means||", "random"), seed = NULL,
+  initSteps = 2, tol = 1e-04)
+
+## S4 method for signature 'KMeansModel'
+summary(object)
+
+## S4 method for signature 'KMeansModel'
+predict(object, newData)
+
+## S4 method for signature 'KMeansModel,character'
+write.ml(object, path, overwrite = FALSE)
+</pre>
+
+
+<h3>Arguments</h3>
+
+<table summary="R argblock">
+<tr valign="top"><td><code>data</code></td>
+<td>
+<p>a SparkDataFrame for training.</p>
+</td></tr>
+<tr valign="top"><td><code>formula</code></td>
+<td>
+<p>a symbolic description of the model to be fitted. Currently only a few formula
+operators are supported, including '~', '.', ':', '+', and '-'.
+Note that the response variable of formula is empty in spark.kmeans.</p>
+</td></tr>
+<tr valign="top"><td><code>...</code></td>
+<td>
+<p>additional argument(s) passed to the method.</p>
+</td></tr>
+<tr valign="top"><td><code>k</code></td>
+<td>
+<p>number of centers.</p>
+</td></tr>
+<tr valign="top"><td><code>maxIter</code></td>
+<td>
+<p>maximum iteration number.</p>
+</td></tr>
+<tr valign="top"><td><code>initMode</code></td>
+<td>
+<p>the initialization algorithm chosen to fit the model.</p>
+</td></tr>
+<tr valign="top"><td><code>seed</code></td>
+<td>
+<p>the random seed for cluster initialization.</p>
+</td></tr>
+<tr valign="top"><td><code>initSteps</code></td>
+<td>
+<p>the number of steps for the k-means|| initialization mode.
+This is an advanced setting, the default of 2 is almost always enough.
+Must be &gt; 0.</p>
+</td></tr>
+<tr valign="top"><td><code>tol</code></td>
+<td>
+<p>convergence tolerance of iterations.</p>
+</td></tr>
+<tr valign="top"><td><code>object</code></td>
+<td>
+<p>a fitted k-means model.</p>
+</td></tr>
+<tr valign="top"><td><code>newData</code></td>
+<td>
+<p>a SparkDataFrame for testing.</p>
+</td></tr>
+<tr valign="top"><td><code>path</code></td>
+<td>
+<p>the directory where the model is saved.</p>
+</td></tr>
+<tr valign="top"><td><code>overwrite</code></td>
+<td>
+<p>overwrites or not if the output path already exists. Default is FALSE
+which means throw exception if the output path exists.</p>
+</td></tr>
+</table>
+
+
+<h3>Value</h3>
+
+<p><code>spark.kmeans</code> returns a fitted k-means model.
+</p>
+<p><code>summary</code> returns summary information of the fitted model, which is a list.
+The list includes the model's <code>k</code> (the configured number of cluster centers),
+<code>coefficients</code> (model cluster centers),
+<code>size</code> (number of data points in each cluster), <code>cluster</code>
+(cluster centers of the transformed data), is.loaded (whether the model is loaded
+from a saved file), and <code>clusterSize</code>
+(the actual number of cluster centers. When using initMode = &quot;random&quot;,
+<code>clusterSize</code> may not equal to <code>k</code>).
+</p>
+<p><code>predict</code> returns the predicted values based on a k-means model.
+</p>
+
+
+<h3>Note</h3>
+
+<p>spark.kmeans since 2.0.0
+</p>
+<p>summary(KMeansModel) since 2.0.0
+</p>
+<p>predict(KMeansModel) since 2.0.0
+</p>
+<p>write.ml(KMeansModel, character) since 2.0.0
+</p>
+
+
+<h3>See Also</h3>
+
+<p><a href="predict.html">predict</a>, <a href="read.ml.html">read.ml</a>, <a href="write.ml.html">write.ml</a>
+</p>
+
+
+<h3>Examples</h3>
+
+<pre><code class="r">## Not run: 
+##D sparkR.session()
+##D t &lt;- as.data.frame(Titanic)
+##D df &lt;- createDataFrame(t)
+##D model &lt;- spark.kmeans(df, Class ~ Survived, k = 4, initMode = &quot;random&quot;)
+##D summary(model)
+##D 
+##D # fitted values on training data
+##D fitted &lt;- predict(model, df)
+##D head(select(fitted, &quot;Class&quot;, &quot;prediction&quot;))
+##D 
+##D # save fitted model to input path
+##D path &lt;- &quot;path/to/model&quot;
+##D write.ml(model, path)
+##D 
+##D # can also read back the saved model and print
+##D savedModel &lt;- read.ml(path)
+##D summary(savedModel)
+## End(Not run)
+</code></pre>
+
+
+<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.3.1 <a href="00Index.html">Index</a>]</div>
+</body></html>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/7a017e0a/site/docs/2.3.1/api/R/spark.kstest.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/R/spark.kstest.html b/site/docs/2.3.1/api/R/spark.kstest.html
new file mode 100644
index 0000000..999724e
--- /dev/null
+++ b/site/docs/2.3.1/api/R/spark.kstest.html
@@ -0,0 +1,130 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: (One-Sample) Kolmogorov-Smirnov Test</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link rel="stylesheet" type="text/css" href="R.css" />
+
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
+<script>hljs.initHighlightingOnLoad();</script>
+</head><body>
+
+<table width="100%" summary="page for spark.kstest {SparkR}"><tr><td>spark.kstest {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
+
+<h2>(One-Sample) Kolmogorov-Smirnov Test</h2>
+
+<h3>Description</h3>
+
+<p><code>spark.kstest</code> Conduct the two-sided Kolmogorov-Smirnov (KS) test for data sampled from a
+continuous distribution.
+</p>
+<p>By comparing the largest difference between the empirical cumulative
+distribution of the sample data and the theoretical distribution we can provide a test for the
+the null hypothesis that the sample data comes from that theoretical distribution.
+</p>
+<p>Users can call <code>summary</code> to obtain a summary of the test, and <code>print.summary.KSTest</code>
+to print out a summary result.
+</p>
+
+
+<h3>Usage</h3>
+
+<pre>
+spark.kstest(data, ...)
+
+## S4 method for signature 'SparkDataFrame'
+spark.kstest(data, testCol = "test",
+  nullHypothesis = c("norm"), distParams = c(0, 1))
+
+## S4 method for signature 'KSTest'
+summary(object)
+
+## S3 method for class 'summary.KSTest'
+print(x, ...)
+</pre>
+
+
+<h3>Arguments</h3>
+
+<table summary="R argblock">
+<tr valign="top"><td><code>data</code></td>
+<td>
+<p>a SparkDataFrame of user data.</p>
+</td></tr>
+<tr valign="top"><td><code>...</code></td>
+<td>
+<p>additional argument(s) passed to the method.</p>
+</td></tr>
+<tr valign="top"><td><code>testCol</code></td>
+<td>
+<p>column name where the test data is from. It should be a column of double type.</p>
+</td></tr>
+<tr valign="top"><td><code>nullHypothesis</code></td>
+<td>
+<p>name of the theoretical distribution tested against. Currently only
+<code>"norm"</code> for normal distribution is supported.</p>
+</td></tr>
+<tr valign="top"><td><code>distParams</code></td>
+<td>
+<p>parameters(s) of the distribution. For <code>nullHypothesis = "norm"</code>,
+we can provide as a vector the mean and standard deviation of
+the distribution. If none is provided, then standard normal will be used.
+If only one is provided, then the standard deviation will be set to be one.</p>
+</td></tr>
+<tr valign="top"><td><code>object</code></td>
+<td>
+<p>test result object of KSTest by <code>spark.kstest</code>.</p>
+</td></tr>
+<tr valign="top"><td><code>x</code></td>
+<td>
+<p>summary object of KSTest returned by <code>summary</code>.</p>
+</td></tr>
+</table>
+
+
+<h3>Value</h3>
+
+<p><code>spark.kstest</code> returns a test result object.
+</p>
+<p><code>summary</code> returns summary information of KSTest object, which is a list.
+The list includes the <code>p.value</code> (p-value), <code>statistic</code> (test statistic
+computed for the test), <code>nullHypothesis</code> (the null hypothesis with its
+parameters tested against) and <code>degreesOfFreedom</code> (degrees of freedom of the test).
+</p>
+
+
+<h3>Note</h3>
+
+<p>spark.kstest since 2.1.0
+</p>
+<p>summary(KSTest) since 2.1.0
+</p>
+<p>print.summary.KSTest since 2.1.0
+</p>
+
+
+<h3>See Also</h3>
+
+<p><a href="http://spark.apache.org/docs/latest/mllib-statistics.html#hypothesis-testing">
+MLlib: Hypothesis Testing</a>
+</p>
+
+
+<h3>Examples</h3>
+
+<pre><code class="r">## Not run: 
+##D data &lt;- data.frame(test = c(0.1, 0.15, 0.2, 0.3, 0.25))
+##D df &lt;- createDataFrame(data)
+##D test &lt;- spark.kstest(df, &quot;test&quot;, &quot;norm&quot;, c(0, 1))
+##D 
+##D # get a summary of the test result
+##D testSummary &lt;- summary(test)
+##D testSummary
+##D 
+##D # print out the summary in an organized way
+##D print.summary.KSTest(testSummary)
+## End(Not run)
+</code></pre>
+
+
+<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.3.1 <a href="00Index.html">Index</a>]</div>
+</body></html>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/7a017e0a/site/docs/2.3.1/api/R/spark.lapply.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/R/spark.lapply.html b/site/docs/2.3.1/api/R/spark.lapply.html
new file mode 100644
index 0000000..eff791d
--- /dev/null
+++ b/site/docs/2.3.1/api/R/spark.lapply.html
@@ -0,0 +1,95 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Run a function over a list of elements, distributing the...</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link rel="stylesheet" type="text/css" href="R.css" />
+
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
+<script>hljs.initHighlightingOnLoad();</script>
+</head><body>
+
+<table width="100%" summary="page for spark.lapply {SparkR}"><tr><td>spark.lapply {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
+
+<h2>Run a function over a list of elements, distributing the computations with Spark</h2>
+
+<h3>Description</h3>
+
+<p>Run a function over a list of elements, distributing the computations with Spark. Applies a
+function in a manner that is similar to doParallel or lapply to elements of a list.
+The computations are distributed using Spark. It is conceptually the same as the following code:
+lapply(list, func)
+</p>
+
+
+<h3>Usage</h3>
+
+<pre>
+spark.lapply(list, func)
+</pre>
+
+
+<h3>Arguments</h3>
+
+<table summary="R argblock">
+<tr valign="top"><td><code>list</code></td>
+<td>
+<p>the list of elements</p>
+</td></tr>
+<tr valign="top"><td><code>func</code></td>
+<td>
+<p>a function that takes one argument.</p>
+</td></tr>
+</table>
+
+
+<h3>Details</h3>
+
+<p>Known limitations:
+</p>
+
+<ul>
+<li><p> variable scoping and capture: compared to R's rich support for variable resolutions,
+the distributed nature of SparkR limits how variables are resolved at runtime. All the
+variables that are available through lexical scoping are embedded in the closure of the
+function and available as read-only variables within the function. The environment variables
+should be stored into temporary variables outside the function, and not directly accessed
+within the function.
+</p>
+</li>
+<li><p> loading external packages: In order to use a package, you need to load it inside the
+closure. For example, if you rely on the MASS module, here is how you would use it:
+</p>
+<pre>
+    train &lt;- function(hyperparam) {
+      library(MASS)
+      lm.ridge("y ~ x+z", data, lambda=hyperparam)
+      model
+    }
+  </pre>
+</li></ul>
+
+
+
+<h3>Value</h3>
+
+<p>a list of results (the exact type being determined by the function)
+</p>
+
+
+<h3>Note</h3>
+
+<p>spark.lapply since 2.0.0
+</p>
+
+
+<h3>Examples</h3>
+
+<pre><code class="r">## Not run: 
+##D sparkR.session()
+##D doubled &lt;- spark.lapply(1:10, function(x){2 * x})
+## End(Not run)
+</code></pre>
+
+
+<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.3.1 <a href="00Index.html">Index</a>]</div>
+</body></html>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/7a017e0a/site/docs/2.3.1/api/R/spark.lda.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/R/spark.lda.html b/site/docs/2.3.1/api/R/spark.lda.html
new file mode 100644
index 0000000..3be2b24
--- /dev/null
+++ b/site/docs/2.3.1/api/R/spark.lda.html
@@ -0,0 +1,246 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Latent Dirichlet Allocation</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link rel="stylesheet" type="text/css" href="R.css" />
+
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
+<script>hljs.initHighlightingOnLoad();</script>
+</head><body>
+
+<table width="100%" summary="page for spark.lda {SparkR}"><tr><td>spark.lda {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
+
+<h2>Latent Dirichlet Allocation</h2>
+
+<h3>Description</h3>
+
+<p><code>spark.lda</code> fits a Latent Dirichlet Allocation model on a SparkDataFrame. Users can call
+<code>summary</code> to get a summary of the fitted LDA model, <code>spark.posterior</code> to compute
+posterior probabilities on new data, <code>spark.perplexity</code> to compute log perplexity on new
+data and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models.
+</p>
+
+
+<h3>Usage</h3>
+
+<pre>
+spark.lda(data, ...)
+
+spark.posterior(object, newData)
+
+spark.perplexity(object, data)
+
+## S4 method for signature 'SparkDataFrame'
+spark.lda(data, features = "features", k = 10,
+  maxIter = 20, optimizer = c("online", "em"), subsamplingRate = 0.05,
+  topicConcentration = -1, docConcentration = -1,
+  customizedStopWords = "", maxVocabSize = bitwShiftL(1, 18))
+
+## S4 method for signature 'LDAModel'
+summary(object, maxTermsPerTopic)
+
+## S4 method for signature 'LDAModel,SparkDataFrame'
+spark.perplexity(object, data)
+
+## S4 method for signature 'LDAModel,SparkDataFrame'
+spark.posterior(object, newData)
+
+## S4 method for signature 'LDAModel,character'
+write.ml(object, path, overwrite = FALSE)
+</pre>
+
+
+<h3>Arguments</h3>
+
+<table summary="R argblock">
+<tr valign="top"><td><code>data</code></td>
+<td>
+<p>A SparkDataFrame for training.</p>
+</td></tr>
+<tr valign="top"><td><code>...</code></td>
+<td>
+<p>additional argument(s) passed to the method.</p>
+</td></tr>
+<tr valign="top"><td><code>object</code></td>
+<td>
+<p>A Latent Dirichlet Allocation model fitted by <code>spark.lda</code>.</p>
+</td></tr>
+<tr valign="top"><td><code>newData</code></td>
+<td>
+<p>A SparkDataFrame for testing.</p>
+</td></tr>
+<tr valign="top"><td><code>features</code></td>
+<td>
+<p>Features column name. Either libSVM-format column or character-format column is
+valid.</p>
+</td></tr>
+<tr valign="top"><td><code>k</code></td>
+<td>
+<p>Number of topics.</p>
+</td></tr>
+<tr valign="top"><td><code>maxIter</code></td>
+<td>
+<p>Maximum iterations.</p>
+</td></tr>
+<tr valign="top"><td><code>optimizer</code></td>
+<td>
+<p>Optimizer to train an LDA model, &quot;online&quot; or &quot;em&quot;, default is &quot;online&quot;.</p>
+</td></tr>
+<tr valign="top"><td><code>subsamplingRate</code></td>
+<td>
+<p>(For online optimizer) Fraction of the corpus to be sampled and used in
+each iteration of mini-batch gradient descent, in range (0, 1].</p>
+</td></tr>
+<tr valign="top"><td><code>topicConcentration</code></td>
+<td>
+<p>concentration parameter (commonly named <code>beta</code> or <code>eta</code>) for
+the prior placed on topic distributions over terms, default -1 to set automatically on the
+Spark side. Use <code>summary</code> to retrieve the effective topicConcentration. Only 1-size
+numeric is accepted.</p>
+</td></tr>
+<tr valign="top"><td><code>docConcentration</code></td>
+<td>
+<p>concentration parameter (commonly named <code>alpha</code>) for the
+prior placed on documents distributions over topics (<code>theta</code>), default -1 to set
+automatically on the Spark side. Use <code>summary</code> to retrieve the effective
+docConcentration. Only 1-size or <code>k</code>-size numeric is accepted.</p>
+</td></tr>
+<tr valign="top"><td><code>customizedStopWords</code></td>
+<td>
+<p>stopwords that need to be removed from the given corpus. Ignore the
+parameter if libSVM-format column is used as the features column.</p>
+</td></tr>
+<tr valign="top"><td><code>maxVocabSize</code></td>
+<td>
+<p>maximum vocabulary size, default 1 &lt;&lt; 18</p>
+</td></tr>
+<tr valign="top"><td><code>maxTermsPerTopic</code></td>
+<td>
+<p>Maximum number of terms to collect for each topic. Default value of 10.</p>
+</td></tr>
+<tr valign="top"><td><code>path</code></td>
+<td>
+<p>The directory where the model is saved.</p>
+</td></tr>
+<tr valign="top"><td><code>overwrite</code></td>
+<td>
+<p>Overwrites or not if the output path already exists. Default is FALSE
+which means throw exception if the output path exists.</p>
+</td></tr>
+</table>
+
+
+<h3>Value</h3>
+
+<p><code>spark.lda</code> returns a fitted Latent Dirichlet Allocation model.
+</p>
+<p><code>summary</code> returns summary information of the fitted model, which is a list.
+The list includes
+</p>
+<table summary="R valueblock">
+<tr valign="top"><td><code><code>docConcentration</code></code></td>
+<td>
+<p>concentration parameter commonly named <code>alpha</code> for
+the prior placed on documents distributions over topics <code>theta</code></p>
+</td></tr>
+<tr valign="top"><td><code><code>topicConcentration</code></code></td>
+<td>
+<p>concentration parameter commonly named <code>beta</code> or
+<code>eta</code> for the prior placed on topic distributions over terms</p>
+</td></tr>
+<tr valign="top"><td><code><code>logLikelihood</code></code></td>
+<td>
+<p>log likelihood of the entire corpus</p>
+</td></tr>
+<tr valign="top"><td><code><code>logPerplexity</code></code></td>
+<td>
+<p>log perplexity</p>
+</td></tr>
+<tr valign="top"><td><code><code>isDistributed</code></code></td>
+<td>
+<p>TRUE for distributed model while FALSE for local model</p>
+</td></tr>
+<tr valign="top"><td><code><code>vocabSize</code></code></td>
+<td>
+<p>number of terms in the corpus</p>
+</td></tr>
+<tr valign="top"><td><code><code>topics</code></code></td>
+<td>
+<p>top 10 terms and their weights of all topics</p>
+</td></tr>
+<tr valign="top"><td><code><code>vocabulary</code></code></td>
+<td>
+<p>whole terms of the training corpus, NULL if libsvm format file
+used as training set</p>
+</td></tr>
+<tr valign="top"><td><code><code>trainingLogLikelihood</code></code></td>
+<td>
+<p>Log likelihood of the observed tokens in the
+training set, given the current parameter estimates:
+log P(docs | topics, topic distributions for docs, Dirichlet hyperparameters)
+It is only for distributed LDA model (i.e., optimizer = &quot;em&quot;)</p>
+</td></tr>
+<tr valign="top"><td><code><code>logPrior</code></code></td>
+<td>
+<p>Log probability of the current parameter estimate:
+log P(topics, topic distributions for docs | Dirichlet hyperparameters)
+It is only for distributed LDA model (i.e., optimizer = &quot;em&quot;)</p>
+</td></tr>
+</table>
+<p><code>spark.perplexity</code> returns the log perplexity of given SparkDataFrame, or the log
+perplexity of the training data if missing argument &quot;data&quot;.
+</p>
+<p><code>spark.posterior</code> returns a SparkDataFrame containing posterior probabilities
+vectors named &quot;topicDistribution&quot;.
+</p>
+
+
+<h3>Note</h3>
+
+<p>spark.lda since 2.1.0
+</p>
+<p>summary(LDAModel) since 2.1.0
+</p>
+<p>spark.perplexity(LDAModel) since 2.1.0
+</p>
+<p>spark.posterior(LDAModel) since 2.1.0
+</p>
+<p>write.ml(LDAModel, character) since 2.1.0
+</p>
+
+
+<h3>See Also</h3>
+
+<p>topicmodels: <a href="https://cran.r-project.org/package=topicmodels">https://cran.r-project.org/package=topicmodels</a>
+</p>
+<p><a href="read.ml.html">read.ml</a>
+</p>
+
+
+<h3>Examples</h3>
+
+<pre><code class="r">## Not run: 
+##D text &lt;- read.df(&quot;data/mllib/sample_lda_libsvm_data.txt&quot;, source = &quot;libsvm&quot;)
+##D model &lt;- spark.lda(data = text, optimizer = &quot;em&quot;)
+##D 
+##D # get a summary of the model
+##D summary(model)
+##D 
+##D # compute posterior probabilities
+##D posterior &lt;- spark.posterior(model, text)
+##D showDF(posterior)
+##D 
+##D # compute perplexity
+##D perplexity &lt;- spark.perplexity(model, text)
+##D 
+##D # save and load the model
+##D path &lt;- &quot;path/to/model&quot;
+##D write.ml(model, path)
+##D savedModel &lt;- read.ml(path)
+##D summary(savedModel)
+## End(Not run)
+</code></pre>
+
+
+<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.3.1 <a href="00Index.html">Index</a>]</div>
+</body></html>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/7a017e0a/site/docs/2.3.1/api/R/spark.logit.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/R/spark.logit.html b/site/docs/2.3.1/api/R/spark.logit.html
new file mode 100644
index 0000000..05627af
--- /dev/null
+++ b/site/docs/2.3.1/api/R/spark.logit.html
@@ -0,0 +1,269 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Logistic Regression Model</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link rel="stylesheet" type="text/css" href="R.css" />
+
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
+<script>hljs.initHighlightingOnLoad();</script>
+</head><body>
+
+<table width="100%" summary="page for spark.logit {SparkR}"><tr><td>spark.logit {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
+
+<h2>Logistic Regression Model</h2>
+
+<h3>Description</h3>
+
+<p>Fits an logistic regression model against a SparkDataFrame. It supports &quot;binomial&quot;: Binary
+logistic regression with pivoting; &quot;multinomial&quot;: Multinomial logistic (softmax) regression
+without pivoting, similar to glmnet. Users can print, make predictions on the produced model
+and save the model to the input path.
+</p>
+
+
+<h3>Usage</h3>
+
+<pre>
+spark.logit(data, formula, ...)
+
+## S4 method for signature 'SparkDataFrame,formula'
+spark.logit(data, formula, regParam = 0,
+  elasticNetParam = 0, maxIter = 100, tol = 1e-06, family = "auto",
+  standardization = TRUE, thresholds = 0.5, weightCol = NULL,
+  aggregationDepth = 2, lowerBoundsOnCoefficients = NULL,
+  upperBoundsOnCoefficients = NULL, lowerBoundsOnIntercepts = NULL,
+  upperBoundsOnIntercepts = NULL, handleInvalid = c("error", "keep",
+  "skip"))
+
+## S4 method for signature 'LogisticRegressionModel'
+summary(object)
+
+## S4 method for signature 'LogisticRegressionModel'
+predict(object, newData)
+
+## S4 method for signature 'LogisticRegressionModel,character'
+write.ml(object, path,
+  overwrite = FALSE)
+</pre>
+
+
+<h3>Arguments</h3>
+
+<table summary="R argblock">
+<tr valign="top"><td><code>data</code></td>
+<td>
+<p>SparkDataFrame for training.</p>
+</td></tr>
+<tr valign="top"><td><code>formula</code></td>
+<td>
+<p>A symbolic description of the model to be fitted. Currently only a few formula
+operators are supported, including '~', '.', ':', '+', and '-'.</p>
+</td></tr>
+<tr valign="top"><td><code>...</code></td>
+<td>
+<p>additional arguments passed to the method.</p>
+</td></tr>
+<tr valign="top"><td><code>regParam</code></td>
+<td>
+<p>the regularization parameter.</p>
+</td></tr>
+<tr valign="top"><td><code>elasticNetParam</code></td>
+<td>
+<p>the ElasticNet mixing parameter. For alpha = 0.0, the penalty is an L2
+penalty. For alpha = 1.0, it is an L1 penalty. For 0.0 &lt; alpha &lt; 1.0,
+the penalty is a combination of L1 and L2. Default is 0.0 which is an
+L2 penalty.</p>
+</td></tr>
+<tr valign="top"><td><code>maxIter</code></td>
+<td>
+<p>maximum iteration number.</p>
+</td></tr>
+<tr valign="top"><td><code>tol</code></td>
+<td>
+<p>convergence tolerance of iterations.</p>
+</td></tr>
+<tr valign="top"><td><code>family</code></td>
+<td>
+<p>the name of family which is a description of the label distribution to be used
+in the model.
+Supported options:
+</p>
+
+<ul>
+<li><p>&quot;auto&quot;: Automatically select the family based on the number of classes:
+If number of classes == 1 || number of classes == 2, set to &quot;binomial&quot;.
+Else, set to &quot;multinomial&quot;.
+</p>
+</li>
+<li><p>&quot;binomial&quot;: Binary logistic regression with pivoting.
+</p>
+</li>
+<li><p>&quot;multinomial&quot;: Multinomial logistic (softmax) regression without
+pivoting.
+</p>
+</li></ul>
+</td></tr>
+<tr valign="top"><td><code>standardization</code></td>
+<td>
+<p>whether to standardize the training features before fitting the model.
+The coefficients of models will be always returned on the original scale,
+so it will be transparent for users. Note that with/without
+standardization, the models should be always converged to the same
+solution when no regularization is applied. Default is TRUE, same as
+glmnet.</p>
+</td></tr>
+<tr valign="top"><td><code>thresholds</code></td>
+<td>
+<p>in binary classification, in range [0, 1]. If the estimated probability of
+class label 1 is &gt; threshold, then predict 1, else 0. A high threshold
+encourages the model to predict 0 more often; a low threshold encourages the
+model to predict 1 more often. Note: Setting this with threshold p is
+equivalent to setting thresholds c(1-p, p). In multiclass (or binary)
+classification to adjust the probability of predicting each class. Array must
+have length equal to the number of classes, with values &gt; 0, excepting that
+at most one value may be 0. The class with largest value p/t is predicted,
+where p is the original probability of that class and t is the class's
+threshold.</p>
+</td></tr>
+<tr valign="top"><td><code>weightCol</code></td>
+<td>
+<p>The weight column name.</p>
+</td></tr>
+<tr valign="top"><td><code>aggregationDepth</code></td>
+<td>
+<p>The depth for treeAggregate (greater than or equal to 2). If the
+dimensions of features or the number of partitions are large, this param
+could be adjusted to a larger size. This is an expert parameter. Default
+value should be good for most cases.</p>
+</td></tr>
+<tr valign="top"><td><code>lowerBoundsOnCoefficients</code></td>
+<td>
+<p>The lower bounds on coefficients if fitting under bound
+constrained optimization.
+The bound matrix must be compatible with the shape (1, number
+of features) for binomial regression, or (number of classes,
+number of features) for multinomial regression.
+It is a R matrix.</p>
+</td></tr>
+<tr valign="top"><td><code>upperBoundsOnCoefficients</code></td>
+<td>
+<p>The upper bounds on coefficients if fitting under bound
+constrained optimization.
+The bound matrix must be compatible with the shape (1, number
+of features) for binomial regression, or (number of classes,
+number of features) for multinomial regression.
+It is a R matrix.</p>
+</td></tr>
+<tr valign="top"><td><code>lowerBoundsOnIntercepts</code></td>
+<td>
+<p>The lower bounds on intercepts if fitting under bound constrained
+optimization.
+The bounds vector size must be equal to 1 for binomial regression,
+or the number
+of classes for multinomial regression.</p>
+</td></tr>
+<tr valign="top"><td><code>upperBoundsOnIntercepts</code></td>
+<td>
+<p>The upper bounds on intercepts if fitting under bound constrained
+optimization.
+The bound vector size must be equal to 1 for binomial regression,
+or the number of classes for multinomial regression.</p>
+</td></tr>
+<tr valign="top"><td><code>handleInvalid</code></td>
+<td>
+<p>How to handle invalid data (unseen labels or NULL values) in features and
+label column of string type.
+Supported options: &quot;skip&quot; (filter out rows with invalid data),
+&quot;error&quot; (throw an error), &quot;keep&quot; (put invalid data in
+a special additional bucket, at index numLabels). Default
+is &quot;error&quot;.</p>
+</td></tr>
+<tr valign="top"><td><code>object</code></td>
+<td>
+<p>an LogisticRegressionModel fitted by <code>spark.logit</code>.</p>
+</td></tr>
+<tr valign="top"><td><code>newData</code></td>
+<td>
+<p>a SparkDataFrame for testing.</p>
+</td></tr>
+<tr valign="top"><td><code>path</code></td>
+<td>
+<p>The directory where the model is saved.</p>
+</td></tr>
+<tr valign="top"><td><code>overwrite</code></td>
+<td>
+<p>Overwrites or not if the output path already exists. Default is FALSE
+which means throw exception if the output path exists.</p>
+</td></tr>
+</table>
+
+
+<h3>Value</h3>
+
+<p><code>spark.logit</code> returns a fitted logistic regression model.
+</p>
+<p><code>summary</code> returns summary information of the fitted model, which is a list.
+The list includes <code>coefficients</code> (coefficients matrix of the fitted model).
+</p>
+<p><code>predict</code> returns the predicted values based on an LogisticRegressionModel.
+</p>
+
+
+<h3>Note</h3>
+
+<p>spark.logit since 2.1.0
+</p>
+<p>summary(LogisticRegressionModel) since 2.1.0
+</p>
+<p>predict(LogisticRegressionModel) since 2.1.0
+</p>
+<p>write.ml(LogisticRegression, character) since 2.1.0
+</p>
+
+
+<h3>Examples</h3>
+
+<pre><code class="r">## Not run: 
+##D sparkR.session()
+##D # binary logistic regression
+##D t &lt;- as.data.frame(Titanic)
+##D training &lt;- createDataFrame(t)
+##D model &lt;- spark.logit(training, Survived ~ ., regParam = 0.5)
+##D summary &lt;- summary(model)
+##D 
+##D # fitted values on training data
+##D fitted &lt;- predict(model, training)
+##D 
+##D # save fitted model to input path
+##D path &lt;- &quot;path/to/model&quot;
+##D write.ml(model, path)
+##D 
+##D # can also read back the saved model and predict
+##D # Note that summary deos not work on loaded model
+##D savedModel &lt;- read.ml(path)
+##D summary(savedModel)
+##D 
+##D # binary logistic regression against two classes with
+##D # upperBoundsOnCoefficients and upperBoundsOnIntercepts
+##D ubc &lt;- matrix(c(1.0, 0.0, 1.0, 0.0), nrow = 1, ncol = 4)
+##D model &lt;- spark.logit(training, Species ~ .,
+##D                       upperBoundsOnCoefficients = ubc,
+##D                       upperBoundsOnIntercepts = 1.0)
+##D 
+##D # multinomial logistic regression
+##D model &lt;- spark.logit(training, Class ~ ., regParam = 0.5)
+##D summary &lt;- summary(model)
+##D 
+##D # multinomial logistic regression with
+##D # lowerBoundsOnCoefficients and lowerBoundsOnIntercepts
+##D lbc &lt;- matrix(c(0.0, -1.0, 0.0, -1.0, 0.0, -1.0, 0.0, -1.0), nrow = 2, ncol = 4)
+##D lbi &lt;- as.array(c(0.0, 0.0))
+##D model &lt;- spark.logit(training, Species ~ ., family = &quot;multinomial&quot;,
+##D                      lowerBoundsOnCoefficients = lbc,
+##D                      lowerBoundsOnIntercepts = lbi)
+## End(Not run)
+</code></pre>
+
+
+<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.3.1 <a href="00Index.html">Index</a>]</div>
+</body></html>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/7a017e0a/site/docs/2.3.1/api/R/spark.mlp.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/R/spark.mlp.html b/site/docs/2.3.1/api/R/spark.mlp.html
new file mode 100644
index 0000000..72eb9a5
--- /dev/null
+++ b/site/docs/2.3.1/api/R/spark.mlp.html
@@ -0,0 +1,190 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Multilayer Perceptron Classification Model</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link rel="stylesheet" type="text/css" href="R.css" />
+
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
+<script>hljs.initHighlightingOnLoad();</script>
+</head><body>
+
+<table width="100%" summary="page for spark.mlp {SparkR}"><tr><td>spark.mlp {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
+
+<h2>Multilayer Perceptron Classification Model</h2>
+
+<h3>Description</h3>
+
+<p><code>spark.mlp</code> fits a multi-layer perceptron neural network model against a SparkDataFrame.
+Users can call <code>summary</code> to print a summary of the fitted model, <code>predict</code> to make
+predictions on new data, and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models.
+Only categorical data is supported.
+For more details, see
+<a href="http://spark.apache.org/docs/latest/ml-classification-regression.html">
+Multilayer Perceptron</a>
+</p>
+
+
+<h3>Usage</h3>
+
+<pre>
+spark.mlp(data, formula, ...)
+
+## S4 method for signature 'SparkDataFrame,formula'
+spark.mlp(data, formula, layers,
+  blockSize = 128, solver = "l-bfgs", maxIter = 100, tol = 1e-06,
+  stepSize = 0.03, seed = NULL, initialWeights = NULL,
+  handleInvalid = c("error", "keep", "skip"))
+
+## S4 method for signature 'MultilayerPerceptronClassificationModel'
+summary(object)
+
+## S4 method for signature 'MultilayerPerceptronClassificationModel'
+predict(object, newData)
+
+## S4 method for signature 'MultilayerPerceptronClassificationModel,character'
+write.ml(object,
+  path, overwrite = FALSE)
+</pre>
+
+
+<h3>Arguments</h3>
+
+<table summary="R argblock">
+<tr valign="top"><td><code>data</code></td>
+<td>
+<p>a <code>SparkDataFrame</code> of observations and labels for model fitting.</p>
+</td></tr>
+<tr valign="top"><td><code>formula</code></td>
+<td>
+<p>a symbolic description of the model to be fitted. Currently only a few formula
+operators are supported, including '~', '.', ':', '+', and '-'.</p>
+</td></tr>
+<tr valign="top"><td><code>...</code></td>
+<td>
+<p>additional arguments passed to the method.</p>
+</td></tr>
+<tr valign="top"><td><code>layers</code></td>
+<td>
+<p>integer vector containing the number of nodes for each layer.</p>
+</td></tr>
+<tr valign="top"><td><code>blockSize</code></td>
+<td>
+<p>blockSize parameter.</p>
+</td></tr>
+<tr valign="top"><td><code>solver</code></td>
+<td>
+<p>solver parameter, supported options: &quot;gd&quot; (minibatch gradient descent) or &quot;l-bfgs&quot;.</p>
+</td></tr>
+<tr valign="top"><td><code>maxIter</code></td>
+<td>
+<p>maximum iteration number.</p>
+</td></tr>
+<tr valign="top"><td><code>tol</code></td>
+<td>
+<p>convergence tolerance of iterations.</p>
+</td></tr>
+<tr valign="top"><td><code>stepSize</code></td>
+<td>
+<p>stepSize parameter.</p>
+</td></tr>
+<tr valign="top"><td><code>seed</code></td>
+<td>
+<p>seed parameter for weights initialization.</p>
+</td></tr>
+<tr valign="top"><td><code>initialWeights</code></td>
+<td>
+<p>initialWeights parameter for weights initialization, it should be a
+numeric vector.</p>
+</td></tr>
+<tr valign="top"><td><code>handleInvalid</code></td>
+<td>
+<p>How to handle invalid data (unseen labels or NULL values) in features and
+label column of string type.
+Supported options: &quot;skip&quot; (filter out rows with invalid data),
+&quot;error&quot; (throw an error), &quot;keep&quot; (put invalid data in
+a special additional bucket, at index numLabels). Default
+is &quot;error&quot;.</p>
+</td></tr>
+<tr valign="top"><td><code>object</code></td>
+<td>
+<p>a Multilayer Perceptron Classification Model fitted by <code>spark.mlp</code></p>
+</td></tr>
+<tr valign="top"><td><code>newData</code></td>
+<td>
+<p>a SparkDataFrame for testing.</p>
+</td></tr>
+<tr valign="top"><td><code>path</code></td>
+<td>
+<p>the directory where the model is saved.</p>
+</td></tr>
+<tr valign="top"><td><code>overwrite</code></td>
+<td>
+<p>overwrites or not if the output path already exists. Default is FALSE
+which means throw exception if the output path exists.</p>
+</td></tr>
+</table>
+
+
+<h3>Value</h3>
+
+<p><code>spark.mlp</code> returns a fitted Multilayer Perceptron Classification Model.
+</p>
+<p><code>summary</code> returns summary information of the fitted model, which is a list.
+The list includes <code>numOfInputs</code> (number of inputs), <code>numOfOutputs</code>
+(number of outputs), <code>layers</code> (array of layer sizes including input
+and output layers), and <code>weights</code> (the weights of layers).
+For <code>weights</code>, it is a numeric vector with length equal to the expected
+given the architecture (i.e., for 8-10-2 network, 112 connection weights).
+</p>
+<p><code>predict</code> returns a SparkDataFrame containing predicted labeled in a column named
+&quot;prediction&quot;.
+</p>
+
+
+<h3>Note</h3>
+
+<p>spark.mlp since 2.1.0
+</p>
+<p>summary(MultilayerPerceptronClassificationModel) since 2.1.0
+</p>
+<p>predict(MultilayerPerceptronClassificationModel) since 2.1.0
+</p>
+<p>write.ml(MultilayerPerceptronClassificationModel, character) since 2.1.0
+</p>
+
+
+<h3>See Also</h3>
+
+<p><a href="read.ml.html">read.ml</a>
+</p>
+<p><a href="write.ml.html">write.ml</a>
+</p>
+
+
+<h3>Examples</h3>
+
+<pre><code class="r">## Not run: 
+##D df &lt;- read.df(&quot;data/mllib/sample_multiclass_classification_data.txt&quot;, source = &quot;libsvm&quot;)
+##D 
+##D # fit a Multilayer Perceptron Classification Model
+##D model &lt;- spark.mlp(df, label ~ features, blockSize = 128, layers = c(4, 3), solver = &quot;l-bfgs&quot;,
+##D                    maxIter = 100, tol = 0.5, stepSize = 1, seed = 1,
+##D                    initialWeights = c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9))
+##D 
+##D # get the summary of the model
+##D summary(model)
+##D 
+##D # make predictions
+##D predictions &lt;- predict(model, df)
+##D 
+##D # save and load the model
+##D path &lt;- &quot;path/to/model&quot;
+##D write.ml(model, path)
+##D savedModel &lt;- read.ml(path)
+##D summary(savedModel)
+## End(Not run)
+</code></pre>
+
+
+<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.3.1 <a href="00Index.html">Index</a>]</div>
+</body></html>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/7a017e0a/site/docs/2.3.1/api/R/spark.naiveBayes.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/R/spark.naiveBayes.html b/site/docs/2.3.1/api/R/spark.naiveBayes.html
new file mode 100644
index 0000000..f200cdb
--- /dev/null
+++ b/site/docs/2.3.1/api/R/spark.naiveBayes.html
@@ -0,0 +1,152 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Naive Bayes Models</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link rel="stylesheet" type="text/css" href="R.css" />
+
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
+<script>hljs.initHighlightingOnLoad();</script>
+</head><body>
+
+<table width="100%" summary="page for spark.naiveBayes {SparkR}"><tr><td>spark.naiveBayes {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
+
+<h2>Naive Bayes Models</h2>
+
+<h3>Description</h3>
+
+<p><code>spark.naiveBayes</code> fits a Bernoulli naive Bayes model against a SparkDataFrame.
+Users can call <code>summary</code> to print a summary of the fitted model, <code>predict</code> to make
+predictions on new data, and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models.
+Only categorical data is supported.
+</p>
+
+
+<h3>Usage</h3>
+
+<pre>
+spark.naiveBayes(data, formula, ...)
+
+## S4 method for signature 'SparkDataFrame,formula'
+spark.naiveBayes(data, formula,
+  smoothing = 1, handleInvalid = c("error", "keep", "skip"))
+
+## S4 method for signature 'NaiveBayesModel'
+summary(object)
+
+## S4 method for signature 'NaiveBayesModel'
+predict(object, newData)
+
+## S4 method for signature 'NaiveBayesModel,character'
+write.ml(object, path,
+  overwrite = FALSE)
+</pre>
+
+
+<h3>Arguments</h3>
+
+<table summary="R argblock">
+<tr valign="top"><td><code>data</code></td>
+<td>
+<p>a <code>SparkDataFrame</code> of observations and labels for model fitting.</p>
+</td></tr>
+<tr valign="top"><td><code>formula</code></td>
+<td>
+<p>a symbolic description of the model to be fitted. Currently only a few formula
+operators are supported, including '~', '.', ':', '+', and '-'.</p>
+</td></tr>
+<tr valign="top"><td><code>...</code></td>
+<td>
+<p>additional argument(s) passed to the method. Currently only <code>smoothing</code>.</p>
+</td></tr>
+<tr valign="top"><td><code>smoothing</code></td>
+<td>
+<p>smoothing parameter.</p>
+</td></tr>
+<tr valign="top"><td><code>handleInvalid</code></td>
+<td>
+<p>How to handle invalid data (unseen labels or NULL values) in features and
+label column of string type.
+Supported options: &quot;skip&quot; (filter out rows with invalid data),
+&quot;error&quot; (throw an error), &quot;keep&quot; (put invalid data in
+a special additional bucket, at index numLabels). Default
+is &quot;error&quot;.</p>
+</td></tr>
+<tr valign="top"><td><code>object</code></td>
+<td>
+<p>a naive Bayes model fitted by <code>spark.naiveBayes</code>.</p>
+</td></tr>
+<tr valign="top"><td><code>newData</code></td>
+<td>
+<p>a SparkDataFrame for testing.</p>
+</td></tr>
+<tr valign="top"><td><code>path</code></td>
+<td>
+<p>the directory where the model is saved.</p>
+</td></tr>
+<tr valign="top"><td><code>overwrite</code></td>
+<td>
+<p>overwrites or not if the output path already exists. Default is FALSE
+which means throw exception if the output path exists.</p>
+</td></tr>
+</table>
+
+
+<h3>Value</h3>
+
+<p><code>spark.naiveBayes</code> returns a fitted naive Bayes model.
+</p>
+<p><code>summary</code> returns summary information of the fitted model, which is a list.
+The list includes <code>apriori</code> (the label distribution) and
+<code>tables</code> (conditional probabilities given the target label).
+</p>
+<p><code>predict</code> returns a SparkDataFrame containing predicted labeled in a column named
+&quot;prediction&quot;.
+</p>
+
+
+<h3>Note</h3>
+
+<p>spark.naiveBayes since 2.0.0
+</p>
+<p>summary(NaiveBayesModel) since 2.0.0
+</p>
+<p>predict(NaiveBayesModel) since 2.0.0
+</p>
+<p>write.ml(NaiveBayesModel, character) since 2.0.0
+</p>
+
+
+<h3>See Also</h3>
+
+<p>e1071: <a href="https://cran.r-project.org/package=e1071">https://cran.r-project.org/package=e1071</a>
+</p>
+<p><a href="write.ml.html">write.ml</a>
+</p>
+
+
+<h3>Examples</h3>
+
+<pre><code class="r">## Not run: 
+##D data &lt;- as.data.frame(UCBAdmissions)
+##D df &lt;- createDataFrame(data)
+##D 
+##D # fit a Bernoulli naive Bayes model
+##D model &lt;- spark.naiveBayes(df, Admit ~ Gender + Dept, smoothing = 0)
+##D 
+##D # get the summary of the model
+##D summary(model)
+##D 
+##D # make predictions
+##D predictions &lt;- predict(model, df)
+##D 
+##D # save and load the model
+##D path &lt;- &quot;path/to/model&quot;
+##D write.ml(model, path)
+##D savedModel &lt;- read.ml(path)
+##D summary(savedModel)
+## End(Not run)
+</code></pre>
+
+
+<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.3.1 <a href="00Index.html">Index</a>]</div>
+</body></html>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/7a017e0a/site/docs/2.3.1/api/R/spark.randomForest.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/R/spark.randomForest.html b/site/docs/2.3.1/api/R/spark.randomForest.html
new file mode 100644
index 0000000..1230088
--- /dev/null
+++ b/site/docs/2.3.1/api/R/spark.randomForest.html
@@ -0,0 +1,248 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Random Forest Model for Regression and Classification</title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<link rel="stylesheet" type="text/css" href="R.css" />
+
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css">
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script>
+<script>hljs.initHighlightingOnLoad();</script>
+</head><body>
+
+<table width="100%" summary="page for spark.randomForest {SparkR}"><tr><td>spark.randomForest {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table>
+
+<h2>Random Forest Model for Regression and Classification</h2>
+
+<h3>Description</h3>
+
+<p><code>spark.randomForest</code> fits a Random Forest Regression model or Classification model on
+a SparkDataFrame. Users can call <code>summary</code> to get a summary of the fitted Random Forest
+model, <code>predict</code> to make predictions on new data, and <code>write.ml</code>/<code>read.ml</code> to
+save/load fitted models.
+For more details, see
+<a href="http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-regression">
+Random Forest Regression</a> and
+<a href="http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier">
+Random Forest Classification</a>
+</p>
+
+
+<h3>Usage</h3>
+
+<pre>
+spark.randomForest(data, formula, ...)
+
+## S4 method for signature 'SparkDataFrame,formula'
+spark.randomForest(data, formula,
+  type = c("regression", "classification"), maxDepth = 5, maxBins = 32,
+  numTrees = 20, impurity = NULL, featureSubsetStrategy = "auto",
+  seed = NULL, subsamplingRate = 1, minInstancesPerNode = 1,
+  minInfoGain = 0, checkpointInterval = 10, maxMemoryInMB = 256,
+  cacheNodeIds = FALSE, handleInvalid = c("error", "keep", "skip"))
+
+## S4 method for signature 'RandomForestRegressionModel'
+summary(object)
+
+## S3 method for class 'summary.RandomForestRegressionModel'
+print(x, ...)
+
+## S4 method for signature 'RandomForestClassificationModel'
+summary(object)
+
+## S3 method for class 'summary.RandomForestClassificationModel'
+print(x, ...)
+
+## S4 method for signature 'RandomForestRegressionModel'
+predict(object, newData)
+
+## S4 method for signature 'RandomForestClassificationModel'
+predict(object, newData)
+
+## S4 method for signature 'RandomForestRegressionModel,character'
+write.ml(object, path,
+  overwrite = FALSE)
+
+## S4 method for signature 'RandomForestClassificationModel,character'
+write.ml(object, path,
+  overwrite = FALSE)
+</pre>
+
+
+<h3>Arguments</h3>
+
+<table summary="R argblock">
+<tr valign="top"><td><code>data</code></td>
+<td>
+<p>a SparkDataFrame for training.</p>
+</td></tr>
+<tr valign="top"><td><code>formula</code></td>
+<td>
+<p>a symbolic description of the model to be fitted. Currently only a few formula
+operators are supported, including '~', ':', '+', and '-'.</p>
+</td></tr>
+<tr valign="top"><td><code>...</code></td>
+<td>
+<p>additional arguments passed to the method.</p>
+</td></tr>
+<tr valign="top"><td><code>type</code></td>
+<td>
+<p>type of model, one of &quot;regression&quot; or &quot;classification&quot;, to fit</p>
+</td></tr>
+<tr valign="top"><td><code>maxDepth</code></td>
+<td>
+<p>Maximum depth of the tree (&gt;= 0).</p>
+</td></tr>
+<tr valign="top"><td><code>maxBins</code></td>
+<td>
+<p>Maximum number of bins used for discretizing continuous features and for choosing
+how to split on features at each node. More bins give higher granularity. Must be
+&gt;= 2 and &gt;= number of categories in any categorical feature.</p>
+</td></tr>
+<tr valign="top"><td><code>numTrees</code></td>
+<td>
+<p>Number of trees to train (&gt;= 1).</p>
+</td></tr>
+<tr valign="top"><td><code>impurity</code></td>
+<td>
+<p>Criterion used for information gain calculation.
+For regression, must be &quot;variance&quot;. For classification, must be one of
+&quot;entropy&quot; and &quot;gini&quot;, default is &quot;gini&quot;.</p>
+</td></tr>
+<tr valign="top"><td><code>featureSubsetStrategy</code></td>
+<td>
+<p>The number of features to consider for splits at each tree node.
+Supported options: &quot;auto&quot;, &quot;all&quot;, &quot;onethird&quot;, &quot;sqrt&quot;, &quot;log2&quot;, (0.0-1.0], [1-n].</p>
+</td></tr>
+<tr valign="top"><td><code>seed</code></td>
+<td>
+<p>integer seed for random number generation.</p>
+</td></tr>
+<tr valign="top"><td><code>subsamplingRate</code></td>
+<td>
+<p>Fraction of the training data used for learning each decision tree, in
+range (0, 1].</p>
+</td></tr>
+<tr valign="top"><td><code>minInstancesPerNode</code></td>
+<td>
+<p>Minimum number of instances each child must have after split.</p>
+</td></tr>
+<tr valign="top"><td><code>minInfoGain</code></td>
+<td>
+<p>Minimum information gain for a split to be considered at a tree node.</p>
+</td></tr>
+<tr valign="top"><td><code>checkpointInterval</code></td>
+<td>
+<p>Param for set checkpoint interval (&gt;= 1) or disable checkpoint (-1).
+Note: this setting will be ignored if the checkpoint directory is not
+set.</p>
+</td></tr>
+<tr valign="top"><td><code>maxMemoryInMB</code></td>
+<td>
+<p>Maximum memory in MB allocated to histogram aggregation.</p>
+</td></tr>
+<tr valign="top"><td><code>cacheNodeIds</code></td>
+<td>
+<p>If FALSE, the algorithm will pass trees to executors to match instances with
+nodes. If TRUE, the algorithm will cache node IDs for each instance. Caching
+can speed up training of deeper trees. Users can set how often should the
+cache be checkpointed or disable it by setting checkpointInterval.</p>
+</td></tr>
+<tr valign="top"><td><code>handleInvalid</code></td>
+<td>
+<p>How to handle invalid data (unseen labels or NULL values) in features and
+label column of string type in classification model.
+Supported options: &quot;skip&quot; (filter out rows with invalid data),
+&quot;error&quot; (throw an error), &quot;keep&quot; (put invalid data in
+a special additional bucket, at index numLabels). Default
+is &quot;error&quot;.</p>
+</td></tr>
+<tr valign="top"><td><code>object</code></td>
+<td>
+<p>A fitted Random Forest regression model or classification model.</p>
+</td></tr>
+<tr valign="top"><td><code>x</code></td>
+<td>
+<p>summary object of Random Forest regression model or classification model
+returned by <code>summary</code>.</p>
+</td></tr>
+<tr valign="top"><td><code>newData</code></td>
+<td>
+<p>a SparkDataFrame for testing.</p>
+</td></tr>
+<tr valign="top"><td><code>path</code></td>
+<td>
+<p>The directory where the model is saved.</p>
+</td></tr>
+<tr valign="top"><td><code>overwrite</code></td>
+<td>
+<p>Overwrites or not if the output path already exists. Default is FALSE
+which means throw exception if the output path exists.</p>
+</td></tr>
+</table>
+
+
+<h3>Value</h3>
+
+<p><code>spark.randomForest</code> returns a fitted Random Forest model.
+</p>
+<p><code>summary</code> returns summary information of the fitted model, which is a list.
+The list of components includes <code>formula</code> (formula),
+<code>numFeatures</code> (number of features), <code>features</code> (list of features),
+<code>featureImportances</code> (feature importances), <code>maxDepth</code> (max depth of trees),
+<code>numTrees</code> (number of trees), and <code>treeWeights</code> (tree weights).
+</p>
+<p><code>predict</code> returns a SparkDataFrame containing predicted labeled in a column named
+&quot;prediction&quot;.
+</p>
+
+
+<h3>Note</h3>
+
+<p>spark.randomForest since 2.1.0
+</p>
+<p>summary(RandomForestRegressionModel) since 2.1.0
+</p>
+<p>print.summary.RandomForestRegressionModel since 2.1.0
+</p>
+<p>summary(RandomForestClassificationModel) since 2.1.0
+</p>
+<p>print.summary.RandomForestClassificationModel since 2.1.0
+</p>
+<p>predict(RandomForestRegressionModel) since 2.1.0
+</p>
+<p>predict(RandomForestClassificationModel) since 2.1.0
+</p>
+<p>write.ml(RandomForestRegressionModel, character) since 2.1.0
+</p>
+<p>write.ml(RandomForestClassificationModel, character) since 2.1.0
+</p>
+
+
+<h3>Examples</h3>
+
+<pre><code class="r">## Not run: 
+##D # fit a Random Forest Regression Model
+##D df &lt;- createDataFrame(longley)
+##D model &lt;- spark.randomForest(df, Employed ~ ., type = &quot;regression&quot;, maxDepth = 5, maxBins = 16)
+##D 
+##D # get the summary of the model
+##D summary(model)
+##D 
+##D # make predictions
+##D predictions &lt;- predict(model, df)
+##D 
+##D # save and load the model
+##D path &lt;- &quot;path/to/model&quot;
+##D write.ml(model, path)
+##D savedModel &lt;- read.ml(path)
+##D summary(savedModel)
+##D 
+##D # fit a Random Forest Classification Model
+##D t &lt;- as.data.frame(Titanic)
+##D df &lt;- createDataFrame(t)
+##D model &lt;- spark.randomForest(df, Survived ~ Freq + Age, &quot;classification&quot;)
+## End(Not run)
+</code></pre>
+
+
+<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.3.1 <a href="00Index.html">Index</a>]</div>
+</body></html>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message