spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23154) Document backwards compatibility guarantees for ML persistence
Date Mon, 12 Feb 2018 23:07:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361575#comment-16361575
] 

Joseph K. Bradley commented on SPARK-23154:
-------------------------------------------

I'd prefer to put it in the subsection on saving & loading.  I'll send a PR now.

[~yanboliang] I actually spent a long time trying to come up with ways to test this, and it's
non-trivial.  The main blocker is that I got pushback from others about putting binary files
(Parquet model data files) in the git repo.  Without that, there isn't a way to store example
models from past versions.  I may just build a separate project to test this outside of apache/spark
itself when I get the chance.  You can find more notes in the JIRA linked in the description
above.

> Document backwards compatibility guarantees for ML persistence
> --------------------------------------------------------------
>
>                 Key: SPARK-23154
>                 URL: https://issues.apache.org/jira/browse/SPARK-23154
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation, ML
>    Affects Versions: 2.3.0
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>            Priority: Major
>
> We have (as far as I know) maintained backwards compatibility for ML persistence, but
this is not documented anywhere.  I'd like us to document it (for spark.ml, not for spark.mllib).
> I'd recommend something like:
> {quote}
> In general, MLlib maintains backwards compatibility for ML persistence.  I.e., if you
save an ML model or Pipeline in one version of Spark, then you should be able to load it back
and use it in a future version of Spark.  However, there are rare exceptions, described below.
> Model persistence: Is a model or Pipeline saved using Apache Spark ML persistence in
Spark version X loadable by Spark version Y?
> * Major versions: No guarantees, but best-effort.
> * Minor and patch versions: Yes; these are backwards compatible.
> * Note about the format: There are no guarantees for a stable persistence format, but
model loading itself is designed to be backwards compatible.
> Model behavior: Does a model or Pipeline in Spark version X behave identically in Spark
version Y?
> * Major versions: No guarantees, but best-effort.
> * Minor and patch versions: Identical behavior, except for bug fixes.
> For both model persistence and model behavior, any breaking changes across a minor version
or patch version are reported in the Spark version release notes. If a breakage is not reported
in release notes, then it should be treated as a bug to be fixed.
> {quote}
> How does this sound?
> Note: We unfortunately don't have tests for backwards compatibility (which has technical
hurdles and can be discussed in [SPARK-15573]).  However, we have made efforts to maintain
it during PR review and Spark release QA, and most users expect it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message