lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3673) Random variate functions
Date Wed, 25 Jul 2012 05:51:34 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422023#comment-13422023
] 

Hoss Man commented on SOLR-3673:
--------------------------------

{quote}
bq. I think at a minimum we should probably add a "seed" argument to all of these functions

...
That mostly makes sense, I am not sure what to do if an RNG is used that needs more seed data
than the end user provides, at the moment I am using the Mersenne Twister which requires 128-bits
of seed data, I am nervous about exposing the particulars of the underlying RNG, or its seeding.
{quote}

This is where my total ignorance of these random generators and how they use comes in: it
looked to me like these generators in your patch just took in a java.util.Random as input
-- is there a particular reason why this Mrs. Twister random needs to be used? what does that
give us that java.util.Random doesn't?

FWIW: 128bits isn't that much if you let the seed argument to the function be an arbitrary
String - even if you ignore the high bits the user just needs to give you 16 chars (less if
we include stuff like the index version)

This is kind of where my "use case" question comes into play as well ... if the goal is just
to use these generators to get a "biased" shuffling of the docs (ie: maybe you use certain
random distribution and then frange filter on it get a set of documents with a roughly predictable
size) then it's not that bad if the seeds aren't very complex -- throw in the SolrCore start
time to get a few more bits, etc....  But if there is some sort of cryptography goal then
obviously having a "good" random seed that is unpredictable is a lot more important.


                
> Random variate functions
> ------------------------
>
>                 Key: SOLR-3673
>                 URL: https://issues.apache.org/jira/browse/SOLR-3673
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0, 5.0
>            Reporter: Greg Bowyer
>            Assignee: Greg Bowyer
>         Attachments: SOLR-3673.patch
>
>
> Hi all
> At my $DAYJOB I have been asked to build a few random variate functions that return random
numbers bound to a distribution.
> I think these can be added to solr.
> I have a hesitation in that the code as written uses / needs uncommons math (because
we want a far better RNG than java's and because I am lazy and did not want to write distributions)
> uncommons math is apache license so we are good on that front
> anyone have any thoughts on this ?
> For reference the functions are:
> rgaussian(mean, stddev) -> Random value aligned to gaussian distribution
> rpoisson(mean) -> Random value aligned to poisson distribution
> rbinomial(n, prob) -> Random value aligned to binomial distribtion
> rcontinous(min ,max) -> random continuous value between min and max
> rdiscrete(min, max) -> Random discrete value between min and max
> rexponential(rate) -> Random value from the exponential distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message