mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: factorization machines as new project
Date Thu, 11 Apr 2013 22:12:50 GMT
One easy thing to do is to build an adjoined matrix type that does the
concatenation on the fly.




On Thu, Apr 11, 2013 at 1:43 PM, Gokhan Capan <gkhncpn@gmail.com> wrote:

> Yeah, it is simpler indeed.
>
> I am going to think about alternative ways to make concatenation easier
> for clients.
>
> Thanks for your review
>
>
> On Thu, Apr 11, 2013 at 10:45 PM, Robin Anil <robin.anil@gmail.com> wrote:
>
>> I would have folded them all as different feature ids in a single vector,
>> makes things a lot simpler and faster.
>>
>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>
>>
>> On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gkhncpn@gmail.com> wrote:
>>
>>> Hi Robin,
>>>
>>> If you are asking why they are arrays, it is because to save clients
>>> from concatenating multiple matrices to create the input.
>>>
>>> I am quoting from libFM paper<http://www.csie.ntu.edu.tw/~b97053/paper/Factorization%20Machines%20with%20libFM.pdf>:
>>> "For easier interpretation,
>>> the features are grouped into indicators for the active user (blue),
>>> active item (red), other movies rated
>>> by the same user (orange), the time in months (green), and the last
>>> movie rated (brown)."
>>>
>>> I thought a client would create multiple group of matrices, and he can
>>> just pass them all to the algorithm.
>>>
>>> Then the wModel is w parameters, it is still array of vectors for me to
>>> keep the indexing consistent, and vModel is the V parameters.
>>>
>>> Was that what you were asking?
>>>
>>>
>>> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <robin.anil@gmail.com>wrote:
>>>
>>>> Comments away. I was a bit confused by the use of Vector[] for w1 and
>>>> Matrix[] for inputs.
>>>>
>>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>>
>>>>
>>>> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gkhncpn@gmail.com>wrote:
>>>>
>>>>> Ted,
>>>>> Robin,
>>>>>
>>>>> Although I did not test on a dataset yet, recently I've been
>>>>> implementing Factorization Machines with SGD optimization.
>>>>>
>>>>> The initial implementation is at
>>>>> https://github.com/gcapan/mahout/tree/fm
>>>>>
>>>>> Would you guys consider to take a look so I can make it better and
>>>>> running?
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nkechi.nnadi@gmail.com>wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm long time lurker.  I would be interested in implementing these.
 I
>>>>>> thought I would get my feet wet with contributing to wiki with
>>>>>> tutorials
>>>>>> since I have used Mahout for recommendation and clustering in my
>>>>>> dissertation.  I have never contributed code before and I would love
>>>>>> to
>>>>>> start now.
>>>>>>
>>>>>> -Nkechi
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <robin.anil@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> > FMs work really well for a whole range of things. Having
>>>>>> implemented them
>>>>>> > myself, I can extend my services as a reviewer if anyone is
willing
>>>>>> to
>>>>>> > start on it.
>>>>>> >
>>>>>> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>>>>>> >
>>>>>> >
>>>>>> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <ted.dunning@gmail.com
>>>>>> >
>>>>>> > wrote:
>>>>>> >
>>>>>> > > Relative to Dan's recent mention of SOM as possible new
project,
>>>>>> here are
>>>>>> > > slides from KDD Cup 2012 in which Stephen Rendle describes
how he
>>>>>> did
>>>>>> > using
>>>>>> > > a very straightforward implementation of Factorization
Machines
>>>>>> [1,2].
>>>>>> > >
>>>>>> > >
>>>>>> > > FMs are interesting in the context of Mahout because they
can be
>>>>>> used in
>>>>>> > a
>>>>>> > > wide variety of settings including recommendation and targeting
>>>>>> and
>>>>>> > because
>>>>>> > > they have very good performance on a number of tasks.
>>>>>> > >
>>>>>> > > I should mention that Robin was the one who first mentioned
FMs
>>>>>> to me.
>>>>>> > >
>>>>>> > > The KDD 2012 competition [3] is of interest in any case
because it
>>>>>> > provides
>>>>>> > > a large amount of realistic data for commercially important
>>>>>> problems.
>>>>>> > >
>>>>>> > > [1]
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf
>>>>>> > >
>>>>>> > > [2]
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf
>>>>>> > >
>>>>>> > > [3] http://www.kddcup2012.org/
>>>>>> > >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Gokhan
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Gokhan
>>>
>>
>>
>
>
> --
> Gokhan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message