cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Clustering key values not distributed
Date Fri, 05 Feb 2016 00:31:39 GMT
Hi Ralf,

I am not familiar with the "columnspec" but I'll try to help.

First, are you sure that the result is not the one expected ? Did you try a
select query specifying a partition key, to check the number of rows
returned ? Partitions aren't ordered when fetched, so something like the
query below would probably be a better approach than fetching all and
limiting to 30 rows.

$ cqlsh> select user_id, event_type, session_type, created_at from
stresscql.batch_too_large WHERE *user_id = '%\x7f\x03/.d29<i\$u\x114'* LIMIT
2000 ;

 - name: created_at
>    cluster: uniform(10..10)


This is very restrictive and looks to be what you finally achieve (like
exactly 10 distinct created_at values per partition). If I understand your
need you would like to have distinct values for other clustering keys too ?
Is that correct ?
The order of your clustering columns matter in Cassandra, it might be the
case for this test, maybe saying uniform(10..10) on the last column means
the previous part of the key should be the same for all the rows. This is
an assumption, probably wrong that you might want to check. Or maybe are
you defining keys as part of the partition key - like ((user_id,
event_type, session_type), created_at)? The schema would help here.

This key being the last one, and you saying you want 10 of those seems to
be forcing other clustering columns to stick with one value somehow, but
once again, I am not sure about how it works :/. So I would basically to
play with number to try to understand the behavior of this tool while using
multiple clustering keys.

What about the result using the following (for example) ?

 - name: created_at
   cluster: uniform(3..20)

I am really uncertain about this, but I imagine it is better to have
something to try than no answer :-).

Let us know how it goes.

C*heers,
-----------------
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com

2016-02-02 8:55 GMT+00:00 Ralf Steppacher <ralf.vivates@gmail.com>:

> I am trying to get the stress tool to generate random values for three
> clustering keys. I am trying to simulate collecting events per user id
> (text, partition key). Events have a session type (text), event type
> (text), and creation time (timestamp) (clustering keys, in that order). For
> testing purposes I ended up with the following column spec:
>
> columnspec:
>  - name: created_at
>    cluster: uniform(10..10)
>  - name: event_type
>    size: uniform(5..10)
>    population: uniform(1..30)
>    cluster: uniform(1..30)
>  - name: session_type
>    size: fixed(5)
>    population: uniform(1..4)
>    cluster: uniform(1..4)
>  - name: user_id
>    size: fixed(15)
>    population: uniform(1..1000000)
>  - name: message
>    size: uniform(10..100)
>    population: uniform(1..100B)
>
> My expectation was that this would lead to anywhere between 10 and 1200
> rows to be created per partition key. But it seems that exactly 10 rows are
> being created, with the created_at timestamp being the only variable that
> is assigned variable values (per partition key). The session_type and
> event_type variables are assigned fixed values. This is even the case if I
> set the cluster distribution to uniform(1..30) and uniform(4..4)
> respectively. With this setting I expected 1200 rows per partition key to
> be created, as announced when running the stress tool, but it is still 10.
>
> [rsteppac@centos bin]$ ./cassandra-stress user
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200]
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
>
>
> Sample of generated data:
>
> cqlsh> select user_id, event_type, session_type, created_at from
> stresscql.batch_too_large LIMIT 30 ;
>
> user_id                     | event_type       | session_type | created_at
>
> -----------------------------+------------------+--------------+--------------------------
>    %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2012-10-19
> 08:14:11+0000
>    %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2004-11-08
> 04:04:56+0000
>    %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 2002-10-15
> 00:39:23+0000
>    %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1999-08-31
> 19:56:30+0000
>    %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1999-04-02
> 20:46:26+0000
>    %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1990-10-08
> 03:27:17+0000
>    %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1984-03-31
> 23:30:34+0000
>    %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1975-11-16
> 02:41:28+0000
>    %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1970-04-07
> 07:23:48+0000
>    %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| |     P+|u\x0b | 1970-03-08
> 23:23:04+0000
>       N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2015-10-12
> 17:48:51+0000
>       N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2010-10-28
> 06:21:13+0000
>       N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2005-06-28
> 03:34:41+0000
>       N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2005-01-29
> 05:26:21+0000
>       N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2003-03-27
> 01:31:24+0000
>       N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2002-03-29
> 14:22:43+0000
>       N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 2000-06-15
> 14:54:29+0000
>       N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1998-03-08
> 13:31:54+0000
>       N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1988-01-21
> 06:38:40+0000
>       N!\x0eUA7^r7d\x06J<v< |  \x1bm/c/Th\x07U |        E}P^k | 1975-08-03
> 21:16:47+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k |
> 2014-11-23 17:05:45+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k |
> 2012-02-23 23:20:54+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k |
> 2012-02-19 12:05:15+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k |
> 2005-10-17 04:22:45+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k |
> 2003-02-24 19:45:06+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k |
> 1996-12-18 06:18:31+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k |
> 1991-06-10 22:07:45+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k |
> 1983-05-05 12:29:09+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k |
> 1972-04-17 21:24:52+0000
> oy\x1c0077H"i\x07\x13_%\x06 |    | \nz@Qj\x1cB |        E}P^k |
> 1971-05-09 23:00:02+0000
>
> (30 rows)
> cqlsh>
>
> If I remove the created_at clustering keys then the other two clustering
> keys are assigned variable values per partition key.
>
> Is there a way to achieve this with the created_at clustering key being
> present?
>
>
> Thanks!
> Ralf

Mime
View raw message