Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 365A0200BA8 for ; Mon, 24 Oct 2016 22:36:55 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 34CF3160AEB; Mon, 24 Oct 2016 20:36:55 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7D8AD160AD7 for ; Mon, 24 Oct 2016 22:36:54 +0200 (CEST) Received: (qmail 25584 invoked by uid 500); 24 Oct 2016 20:36:53 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 25569 invoked by uid 99); 24 Oct 2016 20:36:53 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Oct 2016 20:36:53 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 631CBDFE80; Mon, 24 Oct 2016 20:36:53 +0000 (UTC) From: jkbradley To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans Content-Type: text/plain Message-Id: <20161024203653.631CBDFE80@git1-us-west.apache.org> Date: Mon, 24 Oct 2016 20:36:53 +0000 (UTC) archived-at: Mon, 24 Oct 2016 20:36:55 -0000 Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/11119 > Would you mind pointing me to an example of an algorithm which only copies some, but not all, of the estimator params? ALS is a good example: [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala#L98] > [users identifying initialization method] I agree it's misleading to have mismatched Params initialModel and initMode, especially if Model.initialModel does not exist. I'd say this is an ideal solution: * (in this PR) Have setInitialModel also set k, initMode, etc. (where we create a new initMode called "initialModel"). * Calling setInitMode("initialModel") would probably need to throw an error. This is a minor issue IMO. * (in a follow-up PR) The above bullet point has one bigger issue: Setting initialModel via ```km.set(km.initialModel, initialModel)``` would bypass the setter method and therefore not set k, initMode, etc. appropriately. This issue with tied Params has appeared elsewhere in MLlib as well. We could implement a fix by having the ```Params.set``` method use Scala reflection to call the corresponding setter method. We'd just have to take extra care to test this well. * There are some Params in Models without matching setter methods. Those were added with the intention of having Estimator Params easily accessible from Models. We'll just have to keep these in mind when writing unit tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org