mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: ALS-WR on Million Song dataset
Date Mon, 18 Mar 2013 17:12:11 GMT
You should also be aware that the alpha parameter comes from a formula
the authors introduce to measure the "confidence" in the observed values:

confidence = 1 + alpha * observed_value

You can also change that formula in the code to something that you see
more fit, the paper even suggests alternative variants.

Best,
Sebastian


On 18.03.2013 18:06, Han JU wrote:
> Thanks for quick responses.
> 
> Yes it's that dataset. What I'm using is triplets of "user_id song_id
> play_times", of ~ 1m users. No audio things, just plein text triples.
> 
> It seems to me that the paper about "implicit feedback" matchs well this
> dataset: no explicit ratings, but times of listening to a song.
> 
> Thank you Sean for the alpha value, I think they use big numbers is because
> their values in the R matrix is big.
> 
> 
> 2013/3/18 Sebastian Schelter <ssc.open@googlemail.com>
> 
>> JU,
>>
>> are you refering to this dataset?
>>
>> http://labrosa.ee.columbia.edu/millionsong/tasteprofile
>>
>> On 18.03.2013 17:47, Sean Owen wrote:
>>> One word of caution, is that there are at least two papers on ALS and
>> they
>>> define lambda differently. I think you are talking about "Collaborative
>>> Filtering for Implicit Feedback Datasets".
>>>
>>> I've been working with some folks who point out that alpha=40 seems to be
>>> too high for most data sets. After running some tests on common data
>> sets,
>>> alpha=1 looks much better. YMMV.
>>>
>>> In the end you have to evaluate these two parameters, and the # of
>>> features, across a range to determine what's best.
>>>
>>> Is this data set not a bunch of audio features? I am not sure it works
>> for
>>> ALS, not naturally at least.
>>>
>>>
>>> On Mon, Mar 18, 2013 at 12:39 PM, Han JU <ju.han.felix@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm wondering has someone tried ParallelALS with implicite feedback job
>> on
>>>> million song dataset? Some pointers on alpha and lambda?
>>>>
>>>> In the paper alpha is 40 and lambda is 150, but I don't know what are
>> their
>>>> r values in the matrix. They said is based on time units that users have
>>>> watched the show, so may be it's big.
>>>>
>>>> Many thanks!
>>>> --
>>>> *JU Han*
>>>>
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>
>>
> 
> 


Mime
View raw message