hivemall-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Makoto Yui <m...@apache.org>
Subject Re: Hivemall FFM & Criteo Dataset - LogLess counter
Date Tue, 17 Oct 2017 16:36:59 GMT
Thanks. I'll test FFM with it tomorrow.

Makoto

2017-10-18 1:19 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
> Attached us a sample of 500 examples from my training set represented as
> vector of features.
>
> Regards,
>
>
> On Tue, Oct 17, 2017 at 7:08 PM, Makoto Yui <myui@apache.org> wrote:
>>
>> I need to reproduce your test.
>>
>> Could you give me the sample (100~500 examples are enough) of your
>> training input in gzipped tsv/csv?
>>
>> FFM input format is <field>:<index>:<value>.
>>
>> Thanks,
>> Makoto
>>
>> 2017-10-18 0:59 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
>> > Makoto,
>> >
>> > I am using the default hyper-parameters in addition to the following
>> > settings:
>> >
>> > feature_hashing: 20
>> > classification is enabled
>> > Iterations = 10
>> > K = 2, another test using K = 4
>> > Opt: FTRL (default)
>> >
>> > I tried setting the initial learning to 0.2 and optimizer to AdaGrad
>> > with no
>> > significant changes on the empirical loss.
>> >
>> > Thanks
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Oct 17, 2017 at 6:51 PM, Makoto Yui <myui@apache.org> wrote:
>> >>
>> >> The empirical loss (cumulative logloss) is too large.
>> >>
>> >> The simple test in FieldAwareFactorizationMachineUDTFTest shows that
>> >> empirical loss is decreasing properly but it seems optimization is not
>> >> working correctly in your case.
>> >>
>> >> Could you show me the training hyperparameters?
>> >>
>> >> Makoto
>> >>
>> >> 2017-10-17 19:01 GMT+09:00 Shadi Mari <shadimari@gmail.com>:
>> >> > Hello,
>> >> >
>> >> > I am trying to understand the results produced by FFM on each
>> >> > iteration
>> >> > during the training of Criteo 2014 dataset.
>> >> >
>> >> > Basically, I have 10 mappers running concurrently (each has ~4.5M
>> >> > records),
>> >> > and follows is an output by one of the mappers:
>> >> >
>> >> > -----------------------------
>> >> >
>> >> > fm.FactorizationMachineUDTF|: Wrote 4479491 records to a temporary
>> >> > file
>> >> > for
>> >> > iterative training: hivemall_fm392724107368114556.sgmt (2.02 GiB)
>> >> > Iteration #2 [curLosses=1.5967339372694769E10,
>> >> > prevLosses=4.182558816480771E10, changeRate=0.6182399322209704,
>> >> > #trainingExamples=4479491]
>> >> >
>> >> > -----------------------------
>> >> >
>> >> > Looking at the source code, FFM implementation uses LogLess
>> >> > performance
>> >> > metric when classification is specified, however the curLossess
>> >> > counter
>> >> > is
>> >> > very high 1.5967339372694769E10
>> >> >
>> >> >
>> >> > What does this mean?
>> >> >
>> >> > Regards
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Makoto YUI <myui AT apache.org>
>> >> Research Engineer, Treasure Data, Inc.
>> >> http://myui.github.io/
>> >
>> >
>>
>>
>>
>> --
>> Makoto YUI <myui AT apache.org>
>> Research Engineer, Treasure Data, Inc.
>> http://myui.github.io/
>
>



-- 
Makoto YUI <myui AT apache.org>
Research Engineer, Treasure Data, Inc.
http://myui.github.io/

Mime
View raw message