cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Cranford (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
Date Thu, 28 Sep 2017 16:38:02 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184418#comment-16184418
] 

Daniel Cranford commented on CASSANDRA-12744:
---------------------------------------------

I think the math on this is broken slightly. The seed multiplier is intended to scale all
seeds to the 10^22 magnitude. However, seeds (and the multiplier) are all stored in 64 bit
integers and the math is performed on them is 64 bit math.

10^22 is not representable as a long which has range {noformat}[-(2^63) : 2^63 - 1] = [-9,223,372,036,854,775,808
: 9,223,372,036,854,775,807]{noformat}

Consider that for sample sizes under 1084, the line that calculates the the sample multiplier

{noformat}this.sampleMultiplier = 1 + Math.round(Math.pow(10D, 22 - Math.log10(sampleSize)));{noformat}
will result in a multiplier of Long.MIN_VALUE which when multiplied by any long will result
in 0 or Long.MIN_VALUE reducing your seeds to two distinct values.

I think using 18 instead of 22 as the target exponent should resolve this issue.

Additionally, I think the seed population size is being incorrectly calculated as the range
of the revisit distribution (which defaults to uniform(1..1M)). However, when running in the
default sequential seed mode (without revisits), eg {noformat}cassandra-stress write n=100{noformat},
the size of the seed population is actually the length of the seed sequence (in this case
100).

And when running with seeds generated from a distribution, eg {noformat}cassandra-stress read
-pop dist=gaussian(1..250M){noformat} the size of the seed population is actually the range
of the seed distribution (in this case 250 million).


> Randomness of stress distributions is not good
> ----------------------------------------------
>
>                 Key: CASSANDRA-12744
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: T Jake Luciani
>            Assignee: Ben Slater
>            Priority: Minor
>              Labels: stress
>             Fix For: 4.0
>
>         Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad.  We are using the JDKRandomGenerator()
but in testing of uniform(1..3) we see for 100 iterations it's only outputting 3.  If you
bump it to 10k it hits all 3 values. 
> I made a change to just use the default commons math random generator and now see all
3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message