cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yap Sok Ann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed
Date Wed, 28 Sep 2016 18:53:21 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530531#comment-15530531
] 

Yap Sok Ann commented on CASSANDRA-11138:
-----------------------------------------

Encounter similar problem with the current trunk, with or without the patch.

With the following sample config:
{code:none}
table: table1
table_definition: |
  CREATE TABLE table1 (
    pk1 text,
    pk2 int,
    col1 timestamp,
    col2 text,
    col3 blob,
    PRIMARY KEY ((pk1, pk2), col1, col2)
  ); 

columnspec:
  - name: col1
    cluster: uniform(1..3)

  - name: col2
    cluster: uniform(1..4)
{code}

after insert, there will be duplicate values for {{col2}} *and* {{col3}}:
{code:sql}
cqlsh:stress> select * from table1 where pk1 = 'VFk.mZLR' and pk2 = 1772149447;  

 pk1      | pk2        | col1                     | col2     | col3
----------+------------+--------------------------+----------+--------------------
 VFk.mZLR | 1772149447 | 1994-05-17 08:23:01+0000 | QyCJtb6` | 0x5728b1b79dd2372a
 VFk.mZLR | 1772149447 | 2010-11-24 11:19:30+0000 | QyCJtb6` | 0x5728b1b79dd2372a

(2 rows)
{code}

If I just remove {{col2}} from clustering key, then there is no duplicate:
{code:sql}
cqlsh:stress> select * from table1 where pk1 = 'VFk.mZLR' and pk2 = 1772149447;

 pk1      | pk2        | col1                     | col2     | col3
----------+------------+--------------------------+----------+----------------
 VFk.mZLR | 1772149447 | 1994-05-17 08:23:01+0000 |  '{9j\(; | 0x1080f88c325e
 VFk.mZLR | 1772149447 | 2010-11-24 11:19:30+0000 | sA0wlY>' | 0x763588f2f5a8

(2 rows)
{code}

> cassandra-stress tool - clustering key values not distributed
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-11138
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>         Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>            Reporter: Ralf Steppacher
>              Labels: stress
>         Attachments: 11138-trunk.patch
>
>
> I am trying to get the stress tool to generate random values for three clustering keys.
I am trying to simulate collecting events per user id (text, partition key). Events have a
session type (text), event type (text), and creation time (timestamp) (clustering keys, in
that order). For testing purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..1000000)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows to be created
per partition key. But it seems that exactly 10 rows are being created, with the {{created_at}}
timestamp being the only variable that is assigned variable values (per partition key). The
{{session_type}} and {{event_type}} variables are assigned fixed values. This is even the
case if I set the cluster distribution to uniform(30..30) and uniform(4..4) respectively.
With this setting I expected 1200 rows per partition key to be created, as announced when
running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user profile=../batch_too_large.yaml ops\(insert=1\)
-log level=verbose file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node
10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] total rows
in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from stresscql.batch_too_large
LIMIT 30 ;
> user_id                     | event_type       | session_type | created_at
> -----------------------------+------------------+--------------+--------------------------
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2012-10-19 08:14:11+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2004-11-08 04:04:56+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2002-10-15 00:39:23+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1999-08-31 19:56:30+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1999-04-02 20:46:26+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1990-10-08 03:27:17+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1984-03-31 23:30:34+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1975-11-16 02:41:28+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1970-04-07 07:23:48+0000
>   %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1970-03-08 23:23:04+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2015-10-12 17:48:51+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2010-10-28 06:21:13+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2005-06-28 03:34:41+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2005-01-29 05:26:21+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2003-03-27 01:31:24+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2002-03-29 14:22:43+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2000-06-15 14:54:29+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1998-03-08 13:31:54+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1988-01-21 06:38:40+0000
>      N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1975-08-03 21:16:47+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2014-11-23 17:05:45+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2012-02-23 23:20:54+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2012-02-19 12:05:15+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2005-10-17 04:22:45+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 2003-02-24 19:45:06+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1996-12-18 06:18:31+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1991-06-10 22:07:45+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1983-05-05 12:29:09+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1972-04-17 21:24:52+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k | 1971-05-09 23:00:02+0000
> (30 rows)
> cqlsh>
> {noformat}
> If I remove the {{created_at}} clustering key, then the other two clustering keys are
being assigned variable values per partition key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message