mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Persisting trained models in Mahout
Date Thu, 08 Dec 2011 13:19:39 GMT
Yes, I mean you need to write it and read it in your own code.

What do you mean by training a model? computing similarities? I don't know
if there's such a thing here as "training" on one data set and running on
another. The implementations always use all currently available info. Is
this a cold-start issue?

OutOfMemoryError is nothing to do with this; on such a small data set it
indicates you didn't set your JVM heap size above the default.


On Thu, Dec 8, 2011 at 1:02 PM, Vinod <pillvin@gmail.com> wrote:

> Hi Sean,
>
> Neither Recommender nor any of its parent interface extends serializable so
> there is no way that I'd be able to serialize it.
>
> I agree that the implementations may not have startup overhead. However,
> training a model on millions of row is a cpu, memory & time consuming
> activity. For example, when data set is changed from 100K to 1M in chapter
> 4, program crashes with OutOfMemory after significant amount of time.
>
> I feel that training should be done in development only. Once a developer
> is ok with test results, he should be able to save instance of the trained
> and tested model  (for ex:- recommender or classifier).
>
> These saved instances of trained and tested models only should be deployed
> to production.
>
> Thought?
>
> regards,
> Vinod
>
>
>
> On Thu, Dec 8, 2011 at 6:00 PM, Sean Owen <srowen@gmail.com> wrote:
>
> > Ah right. No, there's still not a provision for this. You would just have
> > to serialize it yourself if you like.
> > Most of the implementations don't have a great deal of startup overhead,
> so
> > don't really need this. The exception is perhaps slope-one, but there you
> > can actually save and supply pre-computed diffs.
> > Still it would be valid to store and re-supply user-user similarities or
> > something. You can do this, manually, by querying for user-user
> > similarities, saving them, then loading them and supplying them via
> > GenericUserSimilarity for instance.
> >
> > On Thu, Dec 8, 2011 at 12:27 PM, Vinod <pillvin@gmail.com> wrote:
> >
> > > Hi Sean,
> > >
> > > Thanks for the quick response.
> > >
> > > By model, I am not referring to data model but, a "trained" recommender
> > > instance.
> > >
> > > Weka, for examples, has ability to save and load models:-
> > > http://weka.wikispaces.com/Serialization
> > > http://weka.wikispaces.com/Saving+and+loading+models
> > >
> > > This avoids the need to train model (recommender) every time a server
> is
> > > bounced or program is restarted.
> > >
> > > regards,
> > > Vinod
> > >
> > >
> > > On Thu, Dec 8, 2011 at 5:43 PM, Sean Owen <srowen@gmail.com> wrote:
> > >
> > > > The classes aren't Serializable, no. In the case of DataModel, it's
> > > assumed
> > > > that you already have some persisted model somewhere, in a DB or file
> > or
> > > > something, so this would be redundant.
> > > >
> > > > On Thu, Dec 8, 2011 at 12:07 PM, Vinod <pillvin@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > This is my first day of experimentation with Mahout. I am following
> > > > "Mahout
> > > > > in Action" book and looking at the sample code provided, it seems
> > that
> > > > > models for ex:- recommender, needs to be trained at the start of
> the
> > > > > program (start/restart). Recommender interface extends Refreshable
> > > which
> > > > > doesn't extend serializable. So, I am wondering if Mahout provides
> an
> > > > > alternate mechanism to to persist trained models (recommender
> > instance
> > > in
> > > > > this case).
> > > > >
> > > > > Apologies if this is a very silly question.
> > > > >
> > > > > Thanks & regards,
> > > > > Vinod
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message