mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From deneche abdelhakim <adene...@gmail.com>
Subject Kernel Ridge Regression
Date Wed, 21 Sep 2011 03:05:16 GMT
I am forwarding this to user@ too

another question: in the paper KRR had some serious limitations concerning
the size of the dataset it could handle, how much data can (MAHOUT-702)
handle and in which PC (or cluster) configuration ?


On Wed, Sep 21, 2011 at 3:44 AM, deneche abdelhakim <adeneche@gmail.com>wrote:

> cool, thanks :)
>
>
> On Tue, Sep 20, 2011 at 11:10 PM, Hector Yee <hector.yee@gmail.com> wrote:
>
>> Yeah its a two line change to PassiveAggressive.java (MAHOUT-702)
>>
>> change the loss to:
>>
>> loss = hinge ( | score - actual| - epsilon ) where hinge(x) = 0 if x < 0,
>> x
>> otherwise
>> epsilon is a new param that controls how much error we tolerate
>> tau remains the same
>> delta = sign(actual - score) * tau * instance
>>
>>
>> On Tue, Sep 20, 2011 at 2:21 PM, Ted Dunning <ted.dunning@gmail.com>
>> wrote:
>>
>> > Anything that requires the solution of large linear systems is usually
>> > susceptible to SGD approaches.
>> >
>> > On Tue, Sep 20, 2011 at 11:24 AM, deneche abdelhakim <
>> adeneche@gmail.com
>> > >wrote:
>> >
>> > > I was reading this paper:
>> > >
>> > > "Combining Predictions for Accurate Recommender Systems"
>> > > http://www.commendo.at/UserFiles/commendo/File/kdd2010-paper.pdf
>> > >
>> > > and one particular method used to blend different recommenders is KRR
>> > > (Kernel Ridge Regression). The authors had the followings conclusion
>> > about
>> > > it:
>> > >
>> > > "KRR is worse than neural networks, but the results are promising. An
>> > > increase of the training set size would lead to a more accurate model.
>> > But
>> > > the huge computational re-
>> > > quirements of KRR limits us to about 6% data. The train time for one
>> KRR
>> > > model on 6% subset (about 42000 samples) is 4 hours."
>> > >
>> > > I don't know why, but I really want to see the quality of the results
>> of
>> > > this method when using larger training sets. So my question is the
>> > > following: will such method benefit from a distributed version
>> > (mapreduce)
>> > > ?
>> > > is such thing already available ? is it interesting to the Mahout
>> project
>> > > in
>> > > general ? I started to document about it and it seems to require some
>> big
>> > > linear system solving.
>> > >
>> >
>>
>>
>>
>> --
>> Yee Yang Li Hector <https://plus.google.com/106746796711269457249>
>> Professional Profile <http://www.linkedin.com/in/yeehector>
>> http://hectorgon.blogspot.com/ (tech + travel)
>> http://hectorgon.com (book reviews)
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message