cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6146) CQL-native stress
Date Thu, 03 Jul 2014 20:50:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051902#comment-14051902
] 

Benedict commented on CASSANDRA-6146:
-------------------------------------

bq. The value component generator uses the seed of the last clustering component so it always
gets the same value for all rows in a partition, since the seeds are cached.

The seed is different for each row, though? So the seed at each 

bq. You can reproduce by changing the default clustering distribution to uniform(1..1024)


Well, since there are 6 clustering components, a uniform(1..1024) default distribution would
yield 512^6 (=(2^9)^6 = 2^54) _average_ number of rows per partition. Not surprisingly this
causes an overflow in calculations. Probably worth spotting and letting people know this is
an absurdly large size if it happens, and also worth using double instead of float everywhere
we calculate a probability.

bq. no_warmup option doesn't work

Good spot. I didn't wire it up.

bq. The value component generator uses the seed of the last clustering component so it always
gets the same value for all rows in a partition, since the seeds are cached.

Ah, you mean all _leaf_ rows (i.e. those sharing the second-lowest level clustering component)
are the same? Well spotted, this is an off-by-1 bug, and I wasn't using a clustering>1
for the leaf. It' shouldn't be the case that they are the same for the whole partition.

bq. I'm concerned we won't be able to explain how to use this to joe user but perhaps if we
come up with better terminology it and some visual examples it will make more sense. For example
the clustering distribution is used to define the possible values in a single partition? if
you have a population of uniform(1..1000) and clustering of fixed(1) you only see one value
per partition

We may need to bikeshed the nomenclature. I don't think clustering is that tough though: it
is the number of instances of that component for each instance of its parent (i.e. for C components
with average N clustering, there will be N^C rows). The only complex bit IMO is the updateratio
and useratio; perhaps we could relabel these to 'rowspervisit' and 'rowsperbatch' and indicate
in the description that they are ratios.

> CQL-native stress
> -----------------
>
>                 Key: CASSANDRA-6146
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6146
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: T Jake Luciani
>             Fix For: 2.1.1
>
>         Attachments: 6146-v2.txt, 6146.txt, 6164-v3.txt
>
>
> The existing CQL "support" in stress is not worth discussing.  We need to start over,
and we might as well kill two birds with one stone and move to the native protocol while we're
at it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message