mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Towards 1.0 - Defining backwards compatibility guarantees
Date Tue, 01 Nov 2011 12:09:11 GMT
FWIW, in Lucene, we do the following:

1. All minor versions within a major release can read prior versions index within the same
major release.  That is, 3.4 can read a 3.3 index.  However, 3.3 cannot read a 3.4 index.
 When a user reads a 3.3 index w/ 3.4, it is silently upgraded to 3.4.  I think this versioning
scheme should work well for us to when it comes to models.  In the new 4.x line, we have a
Codec system which will make it fairly easy for any version to read any other version.

2. For APIs, we typically mark things as @lucene.experimental if we think they may change
within minor releases.  We also mark things as deprecated that are going away.  Deprecated
items are then removed on the next major release.  The upgrade path is usually to go to x.9,
remove all deprecations and then go to x+1.0.

We also communicate to users via release notes when we purposefully broke back compat.

For the most part this works and I would recommend we take similar steps.  First steps would
be to start versioning our models and perhaps our input formats.  I suspect we could simply
take the Lucene code for this (it's time stamp plus something else that I forget, I think)


On Oct 29, 2011, at 11:45 PM, Isabel Drost wrote:

> Mahout seems to be at a stage where we have covered most of the interesting 
> machine learning problems, where it is being used in production by quite some 
> developers - hey, we even got a book that is now available in a printed version.
> Maybe it's time to start taking first steps towards a 1.0 release. One* 
> important step in my opinion is to define what kind of backwards compatibility 
> guarantees we want to give our users - and what guarantees our users really need 
> - after releasing 1.0.
> Just a rough list below - feel free to extend, shrink and change:
> 1) Data input formats - people probably do not want to re-generate vectors from 
> their original data every time they use a new Mahout version.
> 2) Model formats - people probably do not want to have to retrain a model only 
> to make it work with the latest and greatest features of a new Mahout release.
> 3) Model output - when upgrading users probably want to receive model output 
> that is then integrated in their system the same way as with the older relase.
> 4) APIs - I don't see us keeping all interfaces or even abstract classes stable. 
> However users should know which APIs we consider "public facing" and will likely 
> keep stable. Maybe an annotation makes that clear?
> 5) Command line scripts - is there a significant user base relying on the 
> bin/mahout script to warrant working towards keeping that stable between 
> releases?
> Most likely I've forgotten about other vital pieces - just wanted to kick off 
> that discussion.
> Isabel
> * though not the only one - others include but are not limited to the time frame 
> for which we offer support for any given release.

Grant Ingersoll

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message