hivemall-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shadi Mari <shadim...@gmail.com>
Subject Re: Hivemall FFM & Criteo Dataset - LogLess counter
Date Sun, 22 Oct 2017 10:20:04 GMT
Hi Makoto,

I am not sure i came by recommendations to do column-wise normalization as
prerequisite feature engineering step to get good results on criteo
dataset.

LIBFFM shows the following results when L2 norm is enabled, and it quits
after iteration 2 when early stop is enabled.

iter   tr_logloss   va_logloss
   1      0.45206      0.37309
   2      0.44257      0.37679

However (i guess due to a bug in the LogLoss calculation) It shows NAN in
tr_logloss/va_logloss output when L2 norm is disabled.

I am still not sure why L2 norm produced worse results when enabled in
HiveMALL

Regards,


On Fri, Oct 20, 2017 at 10:40 AM, Makoto Yui <myui@apache.org> wrote:

> Shandi,
>
> In my experience using your dataset on Hivemall, instance-wise L2
> normalization does not improve logloss.
> Element(column)-wise normalization would be beneficial for
> quantitative variable but I'm not sure correctness of
> instance(row)-wise normalization.
>
> You can find my test in this PR and find unit tests in
> FieldAwareFactorizationMachineUDTFTest#testSamle() and
> testSampleDisableNorm()
> https://github.com/apache/incubator-hivemall/pull/123
>
> Could you provide logloss output of libffm with/without "--no-norm"
> option if possible?
>
> Thanks,
> Makoto
>
> 2017-10-18 21:52 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> > As you said earlier in another thread, LibFFM has a built-in feature to
> do
> > L2 instance-wise normalization and probably you are right as most of the
> > implementations i have encountered has normalization as a built-in
> feature.
> >
> > E.g.
> > https://github.com/RTBHOUSE/cuda-ffm/blob/fcda42dfd6914ff881fc503e6bbc4c
> 97d983de5f/src/ffm_trainer.cu
> >
> > BTW, i was to able to get a logloss of 0.37xxx when testing using LibFFM.
> >
> > Shadi
> >
> >
> >
> >
> > On Wed, Oct 18, 2017 at 3:37 PM, Makoto Yui <myui@apache.org> wrote:
> >>
> >> I guess instance-wise l2 normalization is mandatory for FFM.
> >> https://github.com/guestwalk/libffm/blob/master/ffm.cpp#L688
> >>
> >> https://github.com/CNevd/libffm-ftrl/blob/
> 4247440cc190346daa0b675135e0542e4933cb0f/ffm.cpp#L310
> >>
> >> Makoto
> >>
> >> 2017-10-18 21:27 GMT+09:00 Makoto Yui <myui@apache.org>:
> >> > At the first update, loss is large but average loss for each update is
> >> > very small using your test.
> >> >
> >> > https://github.com/apache/incubator-hivemall/blob/
> master/core/src/test/java/hivemall/fm/FieldAwareFactorizationMachine
> UDTFTest.java#L85
> >> >
> >> > It might better to implement instance-wise l2 normalization to reduce
> >> > initial losses.
> >> >
> >> > Further investigation is required but I need to focus on the first
> >> > Apache release for this month.
> >> >
> >> > GA of FFM will be v0.5.1 release scheduled on Dec.
> >> >
> >> > Makoto
> >> >
> >> > 2017-10-18 1:36 GMT+09:00 Makoto Yui <myui@apache.org>:
> >> >> Thanks. I'll test FFM with it tomorrow.
> >> >>
> >> >> Makoto
> >> >>
> >> >> 2017-10-18 1:19 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> >> >>> Attached us a sample of 500 examples from my training set
> represented
> >> >>> as
> >> >>> vector of features.
> >> >>>
> >> >>> Regards,
> >> >>>
> >> >>>
> >> >>> On Tue, Oct 17, 2017 at 7:08 PM, Makoto Yui <myui@apache.org>
> wrote:
> >> >>>>
> >> >>>> I need to reproduce your test.
> >> >>>>
> >> >>>> Could you give me the sample (100~500 examples are enough)
of your
> >> >>>> training input in gzipped tsv/csv?
> >> >>>>
> >> >>>> FFM input format is <field>:<index>:<value>.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Makoto
> >> >>>>
> >> >>>> 2017-10-18 0:59 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> >> >>>> > Makoto,
> >> >>>> >
> >> >>>> > I am using the default hyper-parameters in addition to
the
> >> >>>> > following
> >> >>>> > settings:
> >> >>>> >
> >> >>>> > feature_hashing: 20
> >> >>>> > classification is enabled
> >> >>>> > Iterations = 10
> >> >>>> > K = 2, another test using K = 4
> >> >>>> > Opt: FTRL (default)
> >> >>>> >
> >> >>>> > I tried setting the initial learning to 0.2 and optimizer
to
> >> >>>> > AdaGrad
> >> >>>> > with no
> >> >>>> > significant changes on the empirical loss.
> >> >>>> >
> >> >>>> > Thanks
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> > On Tue, Oct 17, 2017 at 6:51 PM, Makoto Yui <myui@apache.org>
> >> >>>> > wrote:
> >> >>>> >>
> >> >>>> >> The empirical loss (cumulative logloss) is too large.
> >> >>>> >>
> >> >>>> >> The simple test in FieldAwareFactorizationMachineUDTFTest
shows
> >> >>>> >> that
> >> >>>> >> empirical loss is decreasing properly but it seems
optimization
> is
> >> >>>> >> not
> >> >>>> >> working correctly in your case.
> >> >>>> >>
> >> >>>> >> Could you show me the training hyperparameters?
> >> >>>> >>
> >> >>>> >> Makoto
> >> >>>> >>
> >> >>>> >> 2017-10-17 19:01 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> >> >>>> >> > Hello,
> >> >>>> >> >
> >> >>>> >> > I am trying to understand the results produced
by FFM on each
> >> >>>> >> > iteration
> >> >>>> >> > during the training of Criteo 2014 dataset.
> >> >>>> >> >
> >> >>>> >> > Basically, I have 10 mappers running concurrently
(each has
> >> >>>> >> > ~4.5M
> >> >>>> >> > records),
> >> >>>> >> > and follows is an output by one of the mappers:
> >> >>>> >> >
> >> >>>> >> > -----------------------------
> >> >>>> >> >
> >> >>>> >> > fm.FactorizationMachineUDTF|: Wrote 4479491 records
to a
> >> >>>> >> > temporary
> >> >>>> >> > file
> >> >>>> >> > for
> >> >>>> >> > iterative training: hivemall_fm392724107368114556.sgmt
(2.02
> >> >>>> >> > GiB)
> >> >>>> >> > Iteration #2 [curLosses=1.5967339372694769E10,
> >> >>>> >> > prevLosses=4.182558816480771E10,
> changeRate=0.6182399322209704,
> >> >>>> >> > #trainingExamples=4479491]
> >> >>>> >> >
> >> >>>> >> > -----------------------------
> >> >>>> >> >
> >> >>>> >> > Looking at the source code, FFM implementation
uses LogLess
> >> >>>> >> > performance
> >> >>>> >> > metric when classification is specified, however
the
> curLossess
> >> >>>> >> > counter
> >> >>>> >> > is
> >> >>>> >> > very high 1.5967339372694769E10
> >> >>>> >> >
> >> >>>> >> >
> >> >>>> >> > What does this mean?
> >> >>>> >> >
> >> >>>> >> > Regards
> >> >>>> >> >
> >> >>>> >> >
> >> >>>> >>
> >> >>>> >>
> >> >>>> >>
> >> >>>> >> --
> >> >>>> >> Makoto YUI <myui AT apache.org>
> >> >>>> >> Research Engineer, Treasure Data, Inc.
> >> >>>> >> http://myui.github.io/
> >> >>>> >
> >> >>>> >
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Makoto YUI <myui AT apache.org>
> >> >>>> Research Engineer, Treasure Data, Inc.
> >> >>>> http://myui.github.io/
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Makoto YUI <myui AT apache.org>
> >> >> Research Engineer, Treasure Data, Inc.
> >> >> http://myui.github.io/
> >> >
> >> >
> >> >
> >> > --
> >> > Makoto YUI <myui AT apache.org>
> >> > Research Engineer, Treasure Data, Inc.
> >> > http://myui.github.io/
> >>
> >>
> >>
> >> --
> >> Makoto YUI <myui AT apache.org>
> >> Research Engineer, Treasure Data, Inc.
> >> http://myui.github.io/
> >
> >
>
>
>
> --
> Makoto YUI <myui AT apache.org>
> Research Engineer, Treasure Data, Inc.
> http://myui.github.io/
>

Mime
View raw message