Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7AF15200CDB for ; Sat, 22 Jul 2017 03:56:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 795F816C3CA; Sat, 22 Jul 2017 01:56:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8FFCE16C390 for ; Sat, 22 Jul 2017 03:56:07 +0200 (CEST) Received: (qmail 40274 invoked by uid 500); 22 Jul 2017 01:56:06 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 40265 invoked by uid 99); 22 Jul 2017 01:56:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Jul 2017 01:56:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E7D7F1A030D for ; Sat, 22 Jul 2017 01:56:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Wl3jLamy3lHt for ; Sat, 22 Jul 2017 01:56:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 8C19D5FB84 for ; Sat, 22 Jul 2017 01:56:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C4B1DE0069 for ; Sat, 22 Jul 2017 01:56:03 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id EDE0E21EE6 for ; Sat, 22 Jul 2017 01:56:01 +0000 (UTC) Date: Sat, 22 Jul 2017 01:56:01 +0000 (UTC) From: "yuhao yang (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-21086) CrossValidator, TrainValidationSplit should preserve all models after fitting MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 22 Jul 2017 01:56:08 -0000 [ https://issues.apache.org/jira/browse/SPARK-21086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097062#comment-16097062 ] yuhao yang commented on SPARK-21086: ------------------------------------ sure, indices sounds fine. For the driver memory, especially for CrossValidator, caching all the trained models would be impractical and not necessary. Even though all the models are collected to the driver, but it's a sequential process. And with the current implementation of CrossValidator, GC can kick in and clear all the previous models which is especially practical for large models. > CrossValidator, TrainValidationSplit should preserve all models after fitting > ----------------------------------------------------------------------------- > > Key: SPARK-21086 > URL: https://issues.apache.org/jira/browse/SPARK-21086 > Project: Spark > Issue Type: New Feature > Components: ML > Affects Versions: 2.2.0 > Reporter: Joseph K. Bradley > > I've heard multiple requests for having CrossValidatorModel and TrainValidationSplitModel preserve the full list of fitted models. This sounds very valuable. > One decision should be made before we do this: Should we save and load the models in ML persistence? That could blow up the size of a saved Pipeline if the models are large. > * I suggest *not* saving the models by default but allowing saving if specified. We could specify whether to save the model as an extra Param for CrossValidatorModelWriter, but we would have to make sure to expose CrossValidatorModelWriter as a public API and modify the return type of CrossValidatorModel.write to be CrossValidatorModelWriter (but this will not be a breaking change). -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org