Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 56278200BEF for ; Wed, 4 Jan 2017 20:04:22 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5625C160B21; Wed, 4 Jan 2017 19:04:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 72941160B3A for ; Wed, 4 Jan 2017 20:04:21 +0100 (CET) Received: (qmail 18403 invoked by uid 500); 4 Jan 2017 19:04:20 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 18176 invoked by uid 99); 4 Jan 2017 19:04:20 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jan 2017 19:04:20 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 2F7D2DFADF; Wed, 4 Jan 2017 19:04:20 +0000 (UTC) From: sethah To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi... Content-Type: text/plain Message-Id: <20170104190420.2F7D2DFADF@git1-us-west.apache.org> Date: Wed, 4 Jan 2017 19:04:20 +0000 (UTC) archived-at: Wed, 04 Jan 2017 19:04:22 -0000 Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15435#discussion_r94521336 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1095,6 +1131,89 @@ private[classification] class MultiClassSummarizer extends Serializable { } /** + * :: Experimental :: + * Summary of multi-classification algorithms. + * + * @param predictions [[DataFrame]] produced by model.transform(). + * @param predictionCol Name for column of prediction in `predictions`. + * @param labelCol Name for column of label in `predictions`. + */ +@Experimental +@Since("2.1.0") +class MulticlassSummary private[ml] ( + @transient val predictions: DataFrame, + val predictionCol: String, + val labelCol: String) extends Serializable { + + @transient private val multinomialMetrics = { + new MulticlassMetrics( + predictions.select( + col(predictionCol), + col(labelCol).cast(DoubleType)) + .rdd.map { case Row(prediction: Double, label: Double) => (prediction, label) }) + } + + /** Returns false positive rate for each label. */ + @Since("2.1.0") + @transient lazy val falsePositiveRateByLabel: Array[Double] = { + multinomialMetrics.labels.map(label => multinomialMetrics.falsePositiveRate(label)) + } + + /** Returns precision for each label. */ + @Since("2.1.0") + @transient lazy val precisionByLabel: Array[Double] = { + multinomialMetrics.labels.map(label => multinomialMetrics.precision(label)) + } + + /** Returns recall for each label. */ + @Since("2.1.0") + @transient lazy val recallByLabel: Array[Double] = { + multinomialMetrics.labels.map(label => multinomialMetrics.recall(label)) + } + + /** + * Returns f-measure for each label. + * @param beta the beta parameter. --- End diff -- This description is not helpful. Let's either put nothing, or say something like "parameter which controls the balance between precision and recall in the f-measure" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org