From reviews-return-1040329-archive-asf-public=cust-asf.ponee.io@spark.apache.org Sun Feb 16 07:34:08 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 8C1981806C4 for ; Sun, 16 Feb 2020 08:34:07 +0100 (CET) Received: (qmail 89611 invoked by uid 500); 16 Feb 2020 07:34:07 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 89439 invoked by uid 99); 16 Feb 2020 07:34:06 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Feb 2020 07:34:06 +0000 From: GitBox To: reviews@spark.apache.org Subject: [GitHub] [spark] huaxingao commented on a change in pull request #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR Message-ID: <158183844675.23922.3087012564787455319.gitbox@gitbox.apache.org> References: In-Reply-To: Date: Sun, 16 Feb 2020 07:34:06 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit huaxingao commented on a change in pull request #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR URL: https://github.com/apache/spark/pull/27570#discussion_r379880985 ########## File path: R/pkg/R/mllib_classification.R ########## @@ -649,3 +655,155 @@ setMethod("write.ml", signature(object = "NaiveBayesModel", path = "character"), function(object, path, overwrite = FALSE) { write_internal(object, path, overwrite) }) + + +#' Factorization Machines Classification Model +#' +#' \code{spark.fmClassifier} fits a factorization classification model against a SparkDataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' Only categorical data is supported. +#' +#' @param data a \code{SparkDataFrame} of observations and labels for model fitting. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#' operators are supported, including '~', '.', ':', '+', and '-'. +#' @param factorSize dimensionality of the factors. +#' @param fitLinear whether to fit linear term. # TODO Can we express this with formula? +#' @param regParam the regularization parameter. +#' @param miniBatchFraction the mini-batch fraction parameter. +#' @param initStd the standard deviation of initial coefficients. +#' @param maxIter maximum iteration number. +#' @param stepSize stepSize parameter. +#' @param tol convergence tolerance of iterations. +#' @param solver solver parameter, supported options: "gd" (minibatch gradient descent) or "adamW". +#' @param thresholds in binary classification, in range [0, 1]. If the estimated probability of +#' class label 1 is > threshold, then predict 1, else 0. A high threshold +#' encourages the model to predict 0 more often; a low threshold encourages the +#' model to predict 1 more often. Note: Setting this with threshold p is +#' equivalent to setting thresholds c(1-p, p). +#' @param seed seed parameter for weights initialization. +#' @param handleInvalid How to handle invalid data (unseen labels or NULL values) in features and +#' label column of string type. +#' Supported options: "skip" (filter out rows with invalid data), +#' "error" (throw an error), "keep" (put invalid data in +#' a special additional bucket, at index numLabels). Default +#' is "error". +#' @param ... additional arguments passed to the method. +#' @return \code{spark.fmClassifier} returns a fitted Factorization Machines Classification Model. +#' @rdname spark.fmClassifier +#' @aliases spark.fmClassifier,SparkDataFrame,formula-method +#' @name spark.fmClassifier +#' @seealso \link{read.ml} +#' @examples +#' \dontrun{ +#' df <- read.df("data/mllib/sample_binary_classification_data.txt", source = "libsvm") +#' +#' # fit Factorization Machines Classification Model +#' model <- spark.fmClassifier( +#' df, label ~ features, +#' regParam = 0.01, maxIter = 10, fitLinear = TRUE +#' ) +#' +#' # get the summary of the model +#' summary(model) +#' +#' # make predictions +#' predictions <- predict(model, df) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.fmClassifier since 3.0.0 +setMethod("spark.fmClassifier", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, factorSize = 8, fitLinear = TRUE, regParam = 0.0, + miniBatchFraction = 1.0, initStd = 0.01, maxIter = 100, stepSize=1.0, + tol = 1e-6, solver = c("adamW", "gd"), thresholds = NULL, seed = NULL, + handleInvalid = c("error", "keep", "skip")) { Review comment: any reason why ```fitIntercept``` is not here? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org