mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Any way Mahout overcome the data sparsity problem ?
Date Tue, 03 Apr 2012 00:40:32 GMT
This problem is much more commonly referred to as the cold start problem and is far smaller
than many authors assume. Typically a dozen good interactions is plenty to get good recommendation
performance and half a dozen suffices to do pretty well. 

Obviously if you are using ratings then most of your audience will never give you that much
data.  If you use implicit data then you are likely to get that much data in the first few
minutes of use and you can accelerate even that's with good ui design. 

There is still a small cold start problem even if it is much smaller than some assume.  Typically,
this can be dealt with using a combination of an anonymous or semi-anonymous model.  Both
are supported in mahout. 

Sent from my iPhone

On Apr 2, 2012, at 4:49 PM, ziad kamel <ziad.kamel25@gmail.com> wrote:

> CF suffers from the data sparsity problem, where users only rate a
> small set of items. That makes the computation of similarity between
> users imprecise and consequently reduces the accuracy of CF
> algorithms.
> http://www.jucs.org/jucs_17_4/a_clustering_approach_for
> 
> 
> 
> On Sun, Apr 1, 2012 at 1:20 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>> Could you say a bit more about what you mean?  Which data sparsity problem?
>> 
>> Sent from my iPhone
>> 
>> On Apr 1, 2012, at 6:35 AM, ziad kamel <ziad.kamel25@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> Is there any ways that mahout CF can overcome the data sparsity problem?
>>> 
>>> Thanks

Mime
View raw message