mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Bourke <sbou...@gmail.com>
Subject Re: Available datasets for recommendations
Date Fri, 08 Jul 2011 10:31:30 GMT
Movielens would be the one thats most commonly used by researchers, they
have a 100k, 1million and 10 million ratings dataset.


On Fri, Jul 8, 2011 at 10:26 AM, Lance Norskog <goksron@gmail.com> wrote:

> Thanks.
>
> Netflix & Yahoo KDD were my first choice, but are gone. It did not
> occur to me that stashing such things away would be wise; packrat
> though I am.
>
> Purpose is testing large user/item or document'/term databases.
>
> On Fri, Jul 8, 2011 at 12:44 AM, Sebastian Schelter <ssc@apache.org>
> wrote:
> > Another dataset to play with is this compilation of song listenings
> scraped
> > from the last.fm API:
> >
> > http://mtg.upf.edu/node/1671.
> >
> > Should include about 20M ratings.
> >
> > --sebastian
> >
> > On 08.07.2011 09:17, Sean Owen wrote:
> >>
> >> The link is http://www.occamslab.com/petricek/data/
> >>
> >> The KDD or Netflix data are plenty big to play with. How big is big for
> >> your
> >> purpose?
> >>
> >> On Fri, Jul 8, 2011 at 7:05 AM, web service<wbsrvc@gmail.com>  wrote:
> >>
> >>> Is it taken offline as well ?
> >>>
> >>> On Thu, Jul 7, 2011 at 10:40 PM, Alex Kozlov<alexvk@cloudera.com>
>  wrote:
> >>>
> >>>> There is still a libimseti dataset
> >>>> http://www.occamslab.com/petricek/datawith 17,359,346 ratings.
>  People
> >>>> are scared after the Netflix lawsuit.
> >>>>
> >>>> On Thu, Jul 7, 2011 at 10:17 PM, Ted Dunning<ted.dunning@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Those are both reasonably large, but not commercial in scale.
> >>>>>
> >>>>> At Veoh, we had about 10 non-zero elements in our raw data.  I think
> >>>>> Netflix
> >>>>> has 100 million.
> >>>>>
> >>>>> On Thu, Jul 7, 2011 at 8:05 PM, Lance Norskog<goksron@gmail.com>
> >>>
> >>> wrote:
> >>>>>
> >>>>>> What recommendation datasets, that are available, are considered
> >>>>>> "large" by Mahout testing standards? Yahoo KDD Cup is offline,
the
> >>>>>> Netflix data went under a cloud...
> >>>>>>
> >>>>>> --
> >>>>>> Lance Norskog
> >>>>>> goksron@gmail.com
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message