hivemall-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shadi Mari <shadim...@gmail.com>
Subject Re: Hivemall FFM & Criteo Dataset - LogLess counter
Date Wed, 18 Oct 2017 12:34:44 GMT
Many Thanks Makoto. I will keep an eye on the GA release and repeat tests
accordingly.

Shadi

On Wed, Oct 18, 2017 at 3:27 PM, Makoto Yui <myui@apache.org> wrote:

> At the first update, loss is large but average loss for each update is
> very small using your test.
> https://github.com/apache/incubator-hivemall/blob/
> master/core/src/test/java/hivemall/fm/FieldAwareFactorizationMachine
> UDTFTest.java#L85
>
> It might better to implement instance-wise l2 normalization to reduce
> initial losses.
>
> Further investigation is required but I need to focus on the first
> Apache release for this month.
>
> GA of FFM will be v0.5.1 release scheduled on Dec.
>
> Makoto
>
> 2017-10-18 1:36 GMT+09:00 Makoto Yui <myui@apache.org>:
> > Thanks. I'll test FFM with it tomorrow.
> >
> > Makoto
> >
> > 2017-10-18 1:19 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> >> Attached us a sample of 500 examples from my training set represented as
> >> vector of features.
> >>
> >> Regards,
> >>
> >>
> >> On Tue, Oct 17, 2017 at 7:08 PM, Makoto Yui <myui@apache.org> wrote:
> >>>
> >>> I need to reproduce your test.
> >>>
> >>> Could you give me the sample (100~500 examples are enough) of your
> >>> training input in gzipped tsv/csv?
> >>>
> >>> FFM input format is <field>:<index>:<value>.
> >>>
> >>> Thanks,
> >>> Makoto
> >>>
> >>> 2017-10-18 0:59 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> >>> > Makoto,
> >>> >
> >>> > I am using the default hyper-parameters in addition to the following
> >>> > settings:
> >>> >
> >>> > feature_hashing: 20
> >>> > classification is enabled
> >>> > Iterations = 10
> >>> > K = 2, another test using K = 4
> >>> > Opt: FTRL (default)
> >>> >
> >>> > I tried setting the initial learning to 0.2 and optimizer to AdaGrad
> >>> > with no
> >>> > significant changes on the empirical loss.
> >>> >
> >>> > Thanks
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > On Tue, Oct 17, 2017 at 6:51 PM, Makoto Yui <myui@apache.org>
wrote:
> >>> >>
> >>> >> The empirical loss (cumulative logloss) is too large.
> >>> >>
> >>> >> The simple test in FieldAwareFactorizationMachineUDTFTest shows
> that
> >>> >> empirical loss is decreasing properly but it seems optimization
is
> not
> >>> >> working correctly in your case.
> >>> >>
> >>> >> Could you show me the training hyperparameters?
> >>> >>
> >>> >> Makoto
> >>> >>
> >>> >> 2017-10-17 19:01 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> >>> >> > Hello,
> >>> >> >
> >>> >> > I am trying to understand the results produced by FFM on each
> >>> >> > iteration
> >>> >> > during the training of Criteo 2014 dataset.
> >>> >> >
> >>> >> > Basically, I have 10 mappers running concurrently (each has
~4.5M
> >>> >> > records),
> >>> >> > and follows is an output by one of the mappers:
> >>> >> >
> >>> >> > -----------------------------
> >>> >> >
> >>> >> > fm.FactorizationMachineUDTF|: Wrote 4479491 records to a temporary
> >>> >> > file
> >>> >> > for
> >>> >> > iterative training: hivemall_fm392724107368114556.sgmt (2.02
GiB)
> >>> >> > Iteration #2 [curLosses=1.5967339372694769E10,
> >>> >> > prevLosses=4.182558816480771E10, changeRate=0.6182399322209704,
> >>> >> > #trainingExamples=4479491]
> >>> >> >
> >>> >> > -----------------------------
> >>> >> >
> >>> >> > Looking at the source code, FFM implementation uses LogLess
> >>> >> > performance
> >>> >> > metric when classification is specified, however the curLossess
> >>> >> > counter
> >>> >> > is
> >>> >> > very high 1.5967339372694769E10
> >>> >> >
> >>> >> >
> >>> >> > What does this mean?
> >>> >> >
> >>> >> > Regards
> >>> >> >
> >>> >> >
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Makoto YUI <myui AT apache.org>
> >>> >> Research Engineer, Treasure Data, Inc.
> >>> >> http://myui.github.io/
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Makoto YUI <myui AT apache.org>
> >>> Research Engineer, Treasure Data, Inc.
> >>> http://myui.github.io/
> >>
> >>
> >
> >
> >
> > --
> > Makoto YUI <myui AT apache.org>
> > Research Engineer, Treasure Data, Inc.
> > http://myui.github.io/
>
>
>
> --
> Makoto YUI <myui AT apache.org>
> Research Engineer, Treasure Data, Inc.
> http://myui.github.io/
>

Mime
View raw message