hivemall-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shadi Mari <shadim...@gmail.com>
Subject Re: Early Stopping / FFMs performance
Date Wed, 11 Oct 2017 14:22:04 GMT
Many thanks Makoto.

I noticed that ffm_train produces ~1.8 Billion raws in the ffm_model table.
In fact, each Tez Task has its own separate Model identified by the Model
ID. Is there a way such models can be reduced into one model as a step
before prediction, so that at least the # of rows can be reduced?

Given such a huge # of rows in the model table, i wonder how Real-Time
prediction is feasible on HiveMALL?

Regards

On Wed, Oct 11, 2017 at 9:17 AM, Makoto Yui <myui@apache.org> wrote:

> Shandi,
>
> - First release (v0.5.0) Nov, 2017
> We plan to release the first Apache release in beginning of Nov.
> Currently, feature freeze phase except minor patches.
>
> FFM is included but still in beta.
>
> - 2nd release (v0.5.1) Dec, 2017
> word2vec and FFM are skipped in the first release and to be included
> in the 2nd release in late Dec.
>
> - 3rd release (v0.6) Q1, 2018
> xgboost and Multi-nominal logistic regression will be introduced in
> the 3rd release in Q1, 2018.
> https://github.com/apache/incubator-hivemall/pull/93
>
> Thanks,
> Makoto
>
> 2017-10-11 2:02 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> > Do you have an anticipated timeframe in order to move from Beta to GA. My
> > observation is that hivemall releases are not so often, and so i would
> like
> > to get a clue of your next cycle timeframe.
> >
> > Many thanks
> >
> > ________________________________
> > From: Makoto Yui <myui@apache.org>
> > Sent: Tuesday, October 10, 2017 7:33:37 PM
> > To: user@hivemall.incubator.apache.org
> > Cc: shadimari@gmail.com
> > Subject: Re: Early Stopping / FFMs performance
> >
> > Hi,
> >
> > 2017-10-11 1:10 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> >> I am using Criteo 2014 dataset for CTR prediction, which is 45M examples
> >> in
> >> total. Do you think 8 hours is still resonable training duration given
> >> than
> >> I am using your  EMR configurations? I never assumed this can be such
> time
> >> consuming.
> >
> > I don't remember exact number but I took 5 hours or so for my previous
> > setting:
> > "-iters 10 -factors 4 -feature_hashing 20"
> >
> > FFM is very computation heavy algorithm and training of FFM takes time.
> >
> >> I already built a version from the master branch. As per your feedback,
> i
> >> assume FFM implementation can not yet be used in production! correct?
> >
> > Yes, it's still in beta. Use it at your own risks.
> >
> > FM implementation is stable and ready for production uses.
> >
> > Hivemall's FFM support linear term and global bias as in plain FM that
> > are not supported in libffm.
> > I'm not yet get a satisfied prediction accuracy in the current FFM
> > implementation for the Criteo 2014 task.
> >
> > It may due to the default hyperparameter setting such as learning rate
> > and l1/l2 params.
> > https://www.kaggle.com/c/criteo-display-ad-challenge/discussion/10555
> >
> > Thanks,
> > Makoto
> >
> > --
> > Makoto YUI <myui AT apache.org>
> > Research Engineer, Treasure Data, Inc.
> > http://myui.github.io/
>
>
>
> --
> Makoto YUI <myui AT apache.org>
> Research Engineer, Treasure Data, Inc.
> http://myui.github.io/
>

Mime
View raw message