mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAHOUT-676) Random samplers in a modular library
Date Mon, 30 May 2011 02:41:47 GMT

     [ https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lance Norskog updated MAHOUT-676:
---------------------------------

    Description: 
This is a modular suite of samplers. It supplies the ability to throw away samples in a useful
way. 

Here is a use case: for my recommendations, I want user activity to decide the amount of influence
on the results. For the number of users who watch X number of movies: 1-5 is 20%, 6-15 is
50%, 15-30 is 30 %, and users who watch over 30 movies are not useful.

* If I know the input distribution, I can supply a function to the Slice sampler to give this
distribution. 
* If I don't know the distribution, I can create a Reservoir sampler for each of the three
buckets. After reading the whole set, I check the sizes of the various buckets and solve for
my distribution. This gives the number of users to pull from each bucket.

  was:This is a modular suite of samplers.


> Random samplers in a modular library
> ------------------------------------
>
>                 Key: MAHOUT-676
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-676
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>            Reporter: Lance Norskog
>            Priority: Minor
>         Attachments: MAHOUT-676.patch, Sampler.patch
>
>
> This is a modular suite of samplers. It supplies the ability to throw away samples in
a useful way. 
> Here is a use case: for my recommendations, I want user activity to decide the amount
of influence on the results. For the number of users who watch X number of movies: 1-5 is
20%, 6-15 is 50%, 15-30 is 30 %, and users who watch over 30 movies are not useful.
> * If I know the input distribution, I can supply a function to the Slice sampler to give
this distribution. 
> * If I don't know the distribution, I can create a Reservoir sampler for each of the
three buckets. After reading the whole set, I check the sizes of the various buckets and solve
for my distribution. This gives the number of users to pull from each bucket.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message