cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Cranford (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
Date Tue, 03 Oct 2017 18:14:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190085#comment-16190085
] 

Daniel Cranford commented on CASSANDRA-12744:
---------------------------------------------

As I've thought about how to fix the seed multiplier, I've come to the conclusion that it
is impossible to use an adaptive multiplier without breaking existing functionality or changing
the command line interface.

One of the key reasons you can specify how the seeds get generated is so that you can partition
the seed space and run multiple cassandra-stress processes on different machines in parallel
so the cassandra-stress client doesn't become the bottleneck. E.G. to write 2 million partitions
from two client machines, you'd run {noformat}cassandra-stress write n=1000000 -pop seq=1..1000000{noformat}
on one client machine and {noformat}cassandra-stress write n=1000000 -pop seq=1000001..2000000{noformat}
on the other client machine.

An adaptive multiplier that attempts to scale the seed sequence so that it's range is 10^22
(or better, Long.MAX_VALUE since seeds are 64 bit longs) would generate the same multiplier
for both client processes resulting in seed sequence overlaps.

To correctly generate an adaptive multiplier, you need global knowledge of the entire range
of seeds being generated by all cassandra-stress processes. This information cannot be supplied
via the current command line interface. The command line interface would have to be updated
in a breaking fashion to support an adaptive multiplier.

Using a hardcoded static multiplier is safe, but would reduce the allowable range of seed
values (and thus reduce the maximum number of distinct partition keys). This probably isn't
a big deal since nobody wants to write 2^64 partitions. But it would need to be chosen with
care so that the number of distinct seeds (and thus the number of distinct partitions) doesn't
become too small.



> Randomness of stress distributions is not good
> ----------------------------------------------
>
>                 Key: CASSANDRA-12744
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: T Jake Luciani
>            Assignee: Ben Slater
>            Priority: Minor
>              Labels: stress
>             Fix For: 4.0
>
>         Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad.  We are using the JDKRandomGenerator()
but in testing of uniform(1..3) we see for 100 iterations it's only outputting 3.  If you
bump it to 10k it hits all 3 values. 
> I made a change to just use the default commons math random generator and now see all
3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message