spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yuhao yang (JIRA)" <>
Subject [jira] [Commented] (SPARK-21086) CrossValidator, TrainValidationSplit should preserve all models after fitting
Date Sat, 22 Jul 2017 01:56:01 GMT


yuhao yang commented on SPARK-21086:

sure, indices sounds fine.

For the driver memory, especially for CrossValidator, caching all the trained models would
be impractical and not necessary. Even though all the models are collected to the driver,
but it's a sequential process. And with the current implementation of CrossValidator, GC can
kick in and clear all the previous models which is especially practical for large models.

> CrossValidator, TrainValidationSplit should preserve all models after fitting
> -----------------------------------------------------------------------------
>                 Key: SPARK-21086
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Joseph K. Bradley
> I've heard multiple requests for having CrossValidatorModel and TrainValidationSplitModel
preserve the full list of fitted models.  This sounds very valuable.
> One decision should be made before we do this: Should we save and load the models in
ML persistence?  That could blow up the size of a saved Pipeline if the models are large.
> * I suggest *not* saving the models by default but allowing saving if specified.  We
could specify whether to save the model as an extra Param for CrossValidatorModelWriter, but
we would have to make sure to expose CrossValidatorModelWriter as a public API and modify
the return type of CrossValidatorModel.write to be CrossValidatorModelWriter (but this will
not be a breaking change).

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message