Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3985C200C0A for ; Sat, 14 Jan 2017 02:24:51 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 380AB160B47; Sat, 14 Jan 2017 01:24:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 80062160B3F for ; Sat, 14 Jan 2017 02:24:50 +0100 (CET) Received: (qmail 33775 invoked by uid 500); 14 Jan 2017 01:24:49 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 33764 invoked by uid 99); 14 Jan 2017 01:24:49 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Jan 2017 01:24:49 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 5F958DFAAF; Sat, 14 Jan 2017 01:24:49 +0000 (UTC) From: felixcheung To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ... Content-Type: text/plain Message-Id: <20170114012449.5F958DFAAF@git1-us-west.apache.org> Date: Sat, 14 Jan 2017 01:24:49 +0000 (UTC) archived-at: Sat, 14 Jan 2017 01:24:51 -0000 Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16566#discussion_r96103542 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -38,6 +45,146 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' @note LDAModel since 2.1.0 setClass("LDAModel", representation(jobj = "jobj")) +#' Bisecting K-Means Clustering Model +#' +#' Fits a bisecting k-means clustering model against a Spark DataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#' operators are supported, including '~', '.', ':', '+', and '-'. +#' Note that the response variable of formula is empty in spark.bisectingKmeans. +#' @param k the desired number of leaf clusters. Must be > 1. +#' The actual number could be smaller if there are no divisible leaf clusters. +#' @param maxIter maximum iteration number. +#' @param minDivisibleClusterSize The minimum number of points (if greater than or equal to 1.0) +#' or the minimum proportion of points (if less than 1.0) of a divisible cluster. +#' @param seed the random seed. +#' @param ... additional argument(s) passed to the method. +#' @return \code{spark.bisectingKmeans} returns a fitted bisecting k-means model. +#' @rdname spark.bisectingKmeans +#' @aliases spark.bisectingKmeans,SparkDataFrame,formula-method +#' @name spark.bisectingKmeans +#' @export +#' @examples +#' \dontrun{ +#' sparkR.session() +#' data(iris) --- End diff -- don't need `data(iris)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org