cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Slater (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-12744) Randomness of stress distributions is not good
Date Tue, 30 May 2017 07:19:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028803#comment-16028803
] 

Ben Slater edited comment on CASSANDRA-12744 at 5/30/17 7:18 AM:
-----------------------------------------------------------------

After some more digging, I've come to the conclusion that the issue is that the JDKRandomGenerator
creates close random numbers when seeded with close values. So, when running with a small
range of potential seeds (from the population) you end up with different random doubles which
all round to the same long value. 

The attached patch multiplies the generated seed so that max seed values are of the order
of 10^22. I've tested this against a couple of the failed dtests and pass OK. In addition,
I get the following results from a range of YAML files (without multiplier result is unmodified
trunk, with multiplier is with this patch applied):

Example 1:
table: test5
table_definition: |
  CREATE TABLE test5 (
        pk int,
        val text,
        PRIMARY KEY (pk)
  ) 
columnspec:
  - name: pk
    size: fixed(64) 
    population: uniform(1..500) 
    
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup  -rate threads=5 -node 127.0.0.1
without multiplier - 47 rows
with multiplier - 490 rows

================================

table: test4
table_definition: |
  CREATE TABLE test4 (
        pk int,
        pk2 text,
        val text,
        PRIMARY KEY ((pk,pk2))
  ) 
columnspec:
  - name: pk
    size: fixed(2) 
    population: uniform(1..5) 
  - name: pk2
    size: fixed(2) 
    population: uniform(1..5) 
    
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup  -rate threads=5 -node 127.0.0.1
without multipler - 1 row
with multiplier - 25 rows

================================

table: test4
table_definition: |
  CREATE TABLE test4 (
        pk int,
        pk2 text,
        val text,
        PRIMARY KEY ((pk,pk2))
  ) 
columnspec:
  - name: pk
    size: fixed(2) 
    population: uniform(1..500M) 
  - name: pk2
    size: fixed(2) 
    population: uniform(1..5) 
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup  -rate threads=5 -node 127.0.0.1
without multipler - 1000 row
with multiplier - 1000 rows

===================================
table: test7
table_definition: |
  CREATE TABLE test7 (
        pk int,
        pk2 text,
        ck1 text,
        val text,
        PRIMARY KEY ((pk,pk2), ck1)
  ) 
columnspec:
  - name: pk
    size: fixed(2) 
    population: uniform(1..100) 
  - name: pk2
    size: fixed(4) 
    population: uniform(1..10000) 
  - name: pk2
    size: fixed(4) 
    population: uniform(1..1000) 

user profile=... ops(insert=1) n=100000 cl=ALL no-warmup  -rate threads=5 -node 127.0.0.1
without multipler - 10342 row
with multiplier - 63387 rows

=====================================
table_definition: |
  CREATE TABLE test7 (
        pk int,
        pk2 text,
        ck1 text,
        val text,
        PRIMARY KEY ((pk,pk2), ck1)
  ) 
columnspec:
  - name: pk
    size: fixed(4) 
    population: seq(1..100) 
  - name: pk2
    size: fixed(10) 
    population: seq(1..10000) 
  - name: pk2
    size: fixed(10) 
    cluster: uniform(1..1000)
    population: seq(1..1000) 

user profile=... ops(insert=1) n=100000 cl=ALL no-warmup  -rate threads=5 -node 127.0.0.1
without multiplier - 25000 row
with multiplier - 43304 rows



 


was (Author: slater_ben):
After some more digging, I've come to the conclusion that the issue is that the JDKRandomGenerator
creates close random numbers when seeded with close values. So, when running with a small
range of potential seeds (from the population) you end up with different random doubles which
all round to the same long value. 

The attached patch multiplies the generated seed so that max seed values are of the order
of 10^22. I've tested this against a couple of the failed dtests and pass OK. In addition,
I get the following results from a range of YAML files:

Example 1:
table: test5
table_definition: |
  CREATE TABLE test5 (
        pk int,
        val text,
        PRIMARY KEY (pk)
  ) 
columnspec:
  - name: pk
    size: fixed(64) 
    population: uniform(1..500) 
    
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup  -rate threads=5 -node 127.0.0.1
without multiplier - 47 rows
with multiplier - 490 rows

================================

table: test4
table_definition: |
  CREATE TABLE test4 (
        pk int,
        pk2 text,
        val text,
        PRIMARY KEY ((pk,pk2))
  ) 
columnspec:
  - name: pk
    size: fixed(2) 
    population: uniform(1..5) 
  - name: pk2
    size: fixed(2) 
    population: uniform(1..5) 
    
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup  -rate threads=5 -node 127.0.0.1
without multipler - 1 row
with multiplier - 25 rows

================================

table: test4
table_definition: |
  CREATE TABLE test4 (
        pk int,
        pk2 text,
        val text,
        PRIMARY KEY ((pk,pk2))
  ) 
columnspec:
  - name: pk
    size: fixed(2) 
    population: uniform(1..500M) 
  - name: pk2
    size: fixed(2) 
    population: uniform(1..5) 
user profile=... ops(insert=1) n=1000 cl=ALL no-warmup  -rate threads=5 -node 127.0.0.1
without multipler - 1000 row
with multiplier - 1000 rows

===================================
table: test7
table_definition: |
  CREATE TABLE test7 (
        pk int,
        pk2 text,
        ck1 text,
        val text,
        PRIMARY KEY ((pk,pk2), ck1)
  ) 
columnspec:
  - name: pk
    size: fixed(2) 
    population: uniform(1..100) 
  - name: pk2
    size: fixed(4) 
    population: uniform(1..10000) 
  - name: pk2
    size: fixed(4) 
    population: uniform(1..1000) 

user profile=... ops(insert=1) n=100000 cl=ALL no-warmup  -rate threads=5 -node 127.0.0.1
without multipler - 10342 row
with multiplier - 63387 rows

=====================================
table_definition: |
  CREATE TABLE test7 (
        pk int,
        pk2 text,
        ck1 text,
        val text,
        PRIMARY KEY ((pk,pk2), ck1)
  ) 
columnspec:
  - name: pk
    size: fixed(4) 
    population: seq(1..100) 
  - name: pk2
    size: fixed(10) 
    population: seq(1..10000) 
  - name: pk2
    size: fixed(10) 
    cluster: uniform(1..1000)
    population: seq(1..1000) 

user profile=... ops(insert=1) n=100000 cl=ALL no-warmup  -rate threads=5 -node 127.0.0.1
without multiplier - 25000 row
with multiplier - 43304 rows



 

> Randomness of stress distributions is not good
> ----------------------------------------------
>
>                 Key: CASSANDRA-12744
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: T Jake Luciani
>            Assignee: Ben Slater
>            Priority: Minor
>              Labels: stress
>             Fix For: 4.0
>
>         Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad.  We are using the JDKRandomGenerator()
but in testing of uniform(1..3) we see for 100 iterations it's only outputting 3.  If you
bump it to 10k it hits all 3 values. 
> I made a change to just use the default commons math random generator and now see all
3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message