hivemall-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shadi Mari <shadim...@gmail.com>
Subject Re: Hivemall FFM & Criteo Dataset - LogLess counter
Date Mon, 23 Oct 2017 16:01:10 GMT
Thanks for the follow up and clarification Makoto.

That was very helpful indeed.

On Mon, Oct 23, 2017 at 10:08 AM, Makoto Yui <myui@apache.org> wrote:

> Libffm itself does not do feature hashing.
>
> They applied feature hashing in [1] AFTER converting a quantitative
> variable v to a categorical variable floor(log(v)^2) [2].
> hash("c1:1") for categorical variables for hash("i1:floor(log(v)^2)")
> for quantitative variables.
>
> [1] https://github.com/guestwalk/kaggle-2014-criteo/blob/
> master/converters/pre-b.py#L21
> [2] https://www.kaggle.com/c/criteo-display-ad-challenge/discussion/10555
>     | However, too many features are generated if numerical features
> are directly transformed into categorical features, so we use v <-
> floor(log(v)^2)
>
> FM (and FFM) does not work well for quantitative variables. Feature
> <value> should be in range [0,1].
> Better to apply column-wise normalization for quantitative variables
> using min-max normalization.
>
> Instance-wise normalization used in LibFFM would be effective where
> all feature values are 1.0 (i.e., categorical), but not for mix of
> quantitative and categorical variables.
>
> Thanks,
> Makoto
>
> 2017-10-23 14:57 GMT+09:00 Makoto Yui <myui@apache.org>:
> > Hashing quantitative variable <index>:<value> into one-hot categorical
> > variable <hashed index>:1.0 will cause overfitting.
> >
> > Thanks,
> > Makoto
> >
> > 2017-10-23 12:28 GMT+09:00 Makoto Yui <myui@apache.org>:
> >> Shadi,
> >>
> >> That's would be bug of Libffm values that quantitative features always
> become 1.
> >>
> >> Feature hashing should only be applied to feature index as used in
> Hivemall.
> >>
> >> Thanks,
> >> Makoto
> >>
> >> 2017-10-22 21:15 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> >>> Makoto,
> >>>
> >>> One thing i noticed as a difference in the data pre-processing between
> >>> HiveMALL and LibFFM is the features vector output representation as
> follows:
> >>>
> >>> Following your procedure to prepare data for CTR prediction on criteo
> >>> dataset, the following format is generated prior to training.
> >>>
> >>> Quantitative Features: <Field Index>:<Hash  of
> FeatureIndex>:<non-hashed
> >>> value>
> >>> Categorical Features: <Field Index>:<Hash of FeatureName#Feature>:1
> >>>
> >>> ["1:8536434:0","2:16407656:1.6094379124341003","3:10302140:
> 2.0794415416798357","4:13706829:0.6931471805599453","
> 5:5455315:9.374582815370232","6:13222850:6.222576268071369","7:12819885:1.
> 6094379124341003","8:9792504:1.6094379124341003","9:
> 3595550:3.367295829986474","10:13416838:0","11:4501267:0.
> 6931471805599453","13:5063761:0.6931471805599453","14:
> 3679010:1","15:2585922:1","16:13108969:1","17:2885220:1","
> 18:11298716:1","19:5890546:1","20:5122308:1","21:5086101:1",
> "22:5141835:1","23:8746837:1","24:13052418:1","25:9200338:1"
> ,"26:10834772:1","27:7431621:1","28:7180825:1","29:1682397:
> 1","30:12428939:1","31:16091328:1","32:9800218:1","
> 33:14626356:1","34:4306667:1","36:3679177:1","37:7616293:1",
> "38:15086658:1","39:16037600:1"]
> >>>
> >>> Following LibFFM approach, the following is generated for both
> quantitative
> >>> and categorical features:
> >>> <Field>:<Hash of Feature>:1
> >>>
> >>> 0:40189:1 1:498397:1 2:131438:1 3:947702:1 4:205745:1 5:786172:1
> 6:754008:1
> >>> 7:514500:1 8:735727:1 9:255381:1 10:756430:1 11:832677:1 12:120252:1
> >>> 13:172672:1 14:398230:1 15:98079:1 16:203633:1 17:397270:1 18:182671:1
> >>> 19:926643:1 20:241196:1 21:198788:1 22:392776:1 23:666512:1 24:540660:1
> >>> 25:807931:1 26:78061:1 27:808848:1 28:503744:1 29:166818:1 30:755327:1
> >>> 31:765122:1 32:382381:1 33:763792:1 34:541960:1 35:979212:1 36:422675:1
> >>> 37:396665:1 38:888004:1
> >>>
> >>> Note: quantitative features are hashed in LibFFM, whereas in your
> procedure,
> >>> Only the index is hashed and value is provided to training algorithm
> as is.
> >>>
> >>> I think this is causing the issue am facing!
> >>>
> >>> Regards
> >>>
> >>> On Sun, Oct 22, 2017 at 1:20 PM, Shadi Mari <shadimari@gmail.com>
> wrote:
> >>>>
> >>>> Hi Makoto,
> >>>>
> >>>> I am not sure i came by recommendations to do column-wise
> normalization as
> >>>> prerequisite feature engineering step to get good results on criteo
> dataset.
> >>>>
> >>>> LIBFFM shows the following results when L2 norm is enabled, and it
> quits
> >>>> after iteration 2 when early stop is enabled.
> >>>>
> >>>> iter   tr_logloss   va_logloss
> >>>>    1      0.45206      0.37309
> >>>>    2      0.44257      0.37679
> >>>>
> >>>> However (i guess due to a bug in the LogLoss calculation) It shows
> NAN in
> >>>> tr_logloss/va_logloss output when L2 norm is disabled.
> >>>>
> >>>> I am still not sure why L2 norm produced worse results when enabled
in
> >>>> HiveMALL
> >>>>
> >>>> Regards,
> >>>>
> >>>>
> >>>> On Fri, Oct 20, 2017 at 10:40 AM, Makoto Yui <myui@apache.org>
wrote:
> >>>>>
> >>>>> Shandi,
> >>>>>
> >>>>> In my experience using your dataset on Hivemall, instance-wise L2
> >>>>> normalization does not improve logloss.
> >>>>> Element(column)-wise normalization would be beneficial for
> >>>>> quantitative variable but I'm not sure correctness of
> >>>>> instance(row)-wise normalization.
> >>>>>
> >>>>> You can find my test in this PR and find unit tests in
> >>>>> FieldAwareFactorizationMachineUDTFTest#testSamle() and
> >>>>> testSampleDisableNorm()
> >>>>> https://github.com/apache/incubator-hivemall/pull/123
> >>>>>
> >>>>> Could you provide logloss output of libffm with/without "--no-norm"
> >>>>> option if possible?
> >>>>>
> >>>>> Thanks,
> >>>>> Makoto
> >>>>>
> >>>>> 2017-10-18 21:52 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> >>>>> > As you said earlier in another thread, LibFFM has a built-in
> feature to
> >>>>> > do
> >>>>> > L2 instance-wise normalization and probably you are right as
most
> of
> >>>>> > the
> >>>>> > implementations i have encountered has normalization as a built-in
> >>>>> > feature.
> >>>>> >
> >>>>> > E.g.
> >>>>> >
> >>>>> > https://github.com/RTBHOUSE/cuda-ffm/blob/
> fcda42dfd6914ff881fc503e6bbc4c97d983de5f/src/ffm_trainer.cu
> >>>>> >
> >>>>> > BTW, i was to able to get a logloss of 0.37xxx when testing
using
> >>>>> > LibFFM.
> >>>>> >
> >>>>> > Shadi
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > On Wed, Oct 18, 2017 at 3:37 PM, Makoto Yui <myui@apache.org>
> wrote:
> >>>>> >>
> >>>>> >> I guess instance-wise l2 normalization is mandatory for
FFM.
> >>>>> >> https://github.com/guestwalk/libffm/blob/master/ffm.cpp#L688
> >>>>> >>
> >>>>> >>
> >>>>> >> https://github.com/CNevd/libffm-ftrl/blob/
> 4247440cc190346daa0b675135e0542e4933cb0f/ffm.cpp#L310
> >>>>> >>
> >>>>> >> Makoto
> >>>>> >>
> >>>>> >> 2017-10-18 21:27 GMT+09:00 Makoto Yui <myui@apache.org>:
> >>>>> >> > At the first update, loss is large but average loss
for each
> update
> >>>>> >> > is
> >>>>> >> > very small using your test.
> >>>>> >> >
> >>>>> >> >
> >>>>> >> > https://github.com/apache/incubator-hivemall/blob/
> master/core/src/test/java/hivemall/fm/FieldAwareFactorizationMachine
> UDTFTest.java#L85
> >>>>> >> >
> >>>>> >> > It might better to implement instance-wise l2 normalization
to
> >>>>> >> > reduce
> >>>>> >> > initial losses.
> >>>>> >> >
> >>>>> >> > Further investigation is required but I need to focus
on the
> first
> >>>>> >> > Apache release for this month.
> >>>>> >> >
> >>>>> >> > GA of FFM will be v0.5.1 release scheduled on Dec.
> >>>>> >> >
> >>>>> >> > Makoto
> >>>>> >> >
> >>>>> >> > 2017-10-18 1:36 GMT+09:00 Makoto Yui <myui@apache.org>:
> >>>>> >> >> Thanks. I'll test FFM with it tomorrow.
> >>>>> >> >>
> >>>>> >> >> Makoto
> >>>>> >> >>
> >>>>> >> >> 2017-10-18 1:19 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> >>>>> >> >>> Attached us a sample of 500 examples from
my training set
> >>>>> >> >>> represented
> >>>>> >> >>> as
> >>>>> >> >>> vector of features.
> >>>>> >> >>>
> >>>>> >> >>> Regards,
> >>>>> >> >>>
> >>>>> >> >>>
> >>>>> >> >>> On Tue, Oct 17, 2017 at 7:08 PM, Makoto Yui
<myui@apache.org>
> >>>>> >> >>> wrote:
> >>>>> >> >>>>
> >>>>> >> >>>> I need to reproduce your test.
> >>>>> >> >>>>
> >>>>> >> >>>> Could you give me the sample (100~500
examples are enough) of
> >>>>> >> >>>> your
> >>>>> >> >>>> training input in gzipped tsv/csv?
> >>>>> >> >>>>
> >>>>> >> >>>> FFM input format is <field>:<index>:<value>.
> >>>>> >> >>>>
> >>>>> >> >>>> Thanks,
> >>>>> >> >>>> Makoto
> >>>>> >> >>>>
> >>>>> >> >>>> 2017-10-18 0:59 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> >>>>> >> >>>> > Makoto,
> >>>>> >> >>>> >
> >>>>> >> >>>> > I am using the default hyper-parameters
in addition to the
> >>>>> >> >>>> > following
> >>>>> >> >>>> > settings:
> >>>>> >> >>>> >
> >>>>> >> >>>> > feature_hashing: 20
> >>>>> >> >>>> > classification is enabled
> >>>>> >> >>>> > Iterations = 10
> >>>>> >> >>>> > K = 2, another test using K = 4
> >>>>> >> >>>> > Opt: FTRL (default)
> >>>>> >> >>>> >
> >>>>> >> >>>> > I tried setting the initial learning
to 0.2 and optimizer
> to
> >>>>> >> >>>> > AdaGrad
> >>>>> >> >>>> > with no
> >>>>> >> >>>> > significant changes on the empirical
loss.
> >>>>> >> >>>> >
> >>>>> >> >>>> > Thanks
> >>>>> >> >>>> >
> >>>>> >> >>>> >
> >>>>> >> >>>> >
> >>>>> >> >>>> >
> >>>>> >> >>>> >
> >>>>> >> >>>> >
> >>>>> >> >>>> > On Tue, Oct 17, 2017 at 6:51 PM,
Makoto Yui <
> myui@apache.org>
> >>>>> >> >>>> > wrote:
> >>>>> >> >>>> >>
> >>>>> >> >>>> >> The empirical loss (cumulative
logloss) is too large.
> >>>>> >> >>>> >>
> >>>>> >> >>>> >> The simple test in FieldAwareFactorizationMachineUDTFTest
> >>>>> >> >>>> >> shows
> >>>>> >> >>>> >> that
> >>>>> >> >>>> >> empirical loss is decreasing
properly but it seems
> >>>>> >> >>>> >> optimization is
> >>>>> >> >>>> >> not
> >>>>> >> >>>> >> working correctly in your case.
> >>>>> >> >>>> >>
> >>>>> >> >>>> >> Could you show me the training
hyperparameters?
> >>>>> >> >>>> >>
> >>>>> >> >>>> >> Makoto
> >>>>> >> >>>> >>
> >>>>> >> >>>> >> 2017-10-17 19:01 GMT+09:00 Shadi
Mari <
> shadimari@gmail.com>:
> >>>>> >> >>>> >> > Hello,
> >>>>> >> >>>> >> >
> >>>>> >> >>>> >> > I am trying to understand
the results produced by FFM on
> >>>>> >> >>>> >> > each
> >>>>> >> >>>> >> > iteration
> >>>>> >> >>>> >> > during the training of Criteo
2014 dataset.
> >>>>> >> >>>> >> >
> >>>>> >> >>>> >> > Basically, I have 10 mappers
running concurrently (each
> has
> >>>>> >> >>>> >> > ~4.5M
> >>>>> >> >>>> >> > records),
> >>>>> >> >>>> >> > and follows is an output
by one of the mappers:
> >>>>> >> >>>> >> >
> >>>>> >> >>>> >> > -----------------------------
> >>>>> >> >>>> >> >
> >>>>> >> >>>> >> > fm.FactorizationMachineUDTF|:
Wrote 4479491 records to a
> >>>>> >> >>>> >> > temporary
> >>>>> >> >>>> >> > file
> >>>>> >> >>>> >> > for
> >>>>> >> >>>> >> > iterative training: hivemall_fm392724107368114556.sgmt
> (2.02
> >>>>> >> >>>> >> > GiB)
> >>>>> >> >>>> >> > Iteration #2 [curLosses=1.5967339372694769E10,
> >>>>> >> >>>> >> > prevLosses=4.182558816480771E10,
> >>>>> >> >>>> >> > changeRate=0.6182399322209704,
> >>>>> >> >>>> >> > #trainingExamples=4479491]
> >>>>> >> >>>> >> >
> >>>>> >> >>>> >> > -----------------------------
> >>>>> >> >>>> >> >
> >>>>> >> >>>> >> > Looking at the source code,
FFM implementation uses
> LogLess
> >>>>> >> >>>> >> > performance
> >>>>> >> >>>> >> > metric when classification
is specified, however the
> >>>>> >> >>>> >> > curLossess
> >>>>> >> >>>> >> > counter
> >>>>> >> >>>> >> > is
> >>>>> >> >>>> >> > very high 1.5967339372694769E10
> >>>>> >> >>>> >> >
> >>>>> >> >>>> >> >
> >>>>> >> >>>> >> > What does this mean?
> >>>>> >> >>>> >> >
> >>>>> >> >>>> >> > Regards
> >>>>> >> >>>> >> >
> >>>>> >> >>>> >> >
> >>>>> >> >>>> >>
> >>>>> >> >>>> >>
> >>>>> >> >>>> >>
> >>>>> >> >>>> >> --
> >>>>> >> >>>> >> Makoto YUI <myui AT apache.org>
> >>>>> >> >>>> >> Research Engineer, Treasure Data,
Inc.
> >>>>> >> >>>> >> http://myui.github.io/
> >>>>> >> >>>> >
> >>>>> >> >>>> >
> >>>>> >> >>>>
> >>>>> >> >>>>
> >>>>> >> >>>>
> >>>>> >> >>>> --
> >>>>> >> >>>> Makoto YUI <myui AT apache.org>
> >>>>> >> >>>> Research Engineer, Treasure Data, Inc.
> >>>>> >> >>>> http://myui.github.io/
> >>>>> >> >>>
> >>>>> >> >>>
> >>>>> >> >>
> >>>>> >> >>
> >>>>> >> >>
> >>>>> >> >> --
> >>>>> >> >> Makoto YUI <myui AT apache.org>
> >>>>> >> >> Research Engineer, Treasure Data, Inc.
> >>>>> >> >> http://myui.github.io/
> >>>>> >> >
> >>>>> >> >
> >>>>> >> >
> >>>>> >> > --
> >>>>> >> > Makoto YUI <myui AT apache.org>
> >>>>> >> > Research Engineer, Treasure Data, Inc.
> >>>>> >> > http://myui.github.io/
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >> --
> >>>>> >> Makoto YUI <myui AT apache.org>
> >>>>> >> Research Engineer, Treasure Data, Inc.
> >>>>> >> http://myui.github.io/
> >>>>> >
> >>>>> >
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Makoto YUI <myui AT apache.org>
> >>>>> Research Engineer, Treasure Data, Inc.
> >>>>> http://myui.github.io/
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Makoto YUI <myui AT apache.org>
> >> Research Engineer, Treasure Data, Inc.
> >> http://myui.github.io/
> >
> >
> >
> > --
> > Makoto YUI <myui AT apache.org>
> > Research Engineer, Treasure Data, Inc.
> > http://myui.github.io/
>
>
>
> --
> Makoto YUI <myui AT apache.org>
> Research Engineer, Treasure Data, Inc.
> http://myui.github.io/
>

Mime
View raw message