cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ralf Steppacher <ralf.viva...@gmail.com>
Subject Re: cassandra-stress tool - InvalidQueryException: Batch too large
Date Tue, 02 Feb 2016 08:07:05 GMT
I have raised https://issues.apache.org/jira/browse/CASSANDRA-11105 <https://issues.apache.org/jira/browse/CASSANDRA-11105>.

Thanks!
Ralf

> On 01.02.2016, at 15:01, Jake Luciani <jakers@gmail.com> wrote:
> 
> Yeah that looks like a bug.  Can you open a JIRA and attach the full .yaml?
> 
> Thanks!
> 
> 
> On Mon, Feb 1, 2016 at 5:09 AM, Ralf Steppacher <ralf.vivates@gmail.com <mailto:ralf.vivates@gmail.com>>
wrote:
> I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress tool to work
for my test scenario. I have followed the example on http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
<http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema>
to create a yaml file describing my test.
> 
> I am collecting events per user id (text, partition key). Events have a session type
(text), event type (text), and creation time (timestamp) (clustering keys, in that order).
Plus some more attributes required for rendering the events in a UI. For testing purposes
I ended up with the following column spec and insert distribution:
> 
> columnspec:
>   - name: created_at
>     cluster: uniform(10..10000)
>   - name: event_type
>     size: uniform(5..10)
>     population: uniform(1..30)
>     cluster: uniform(1..30)
>   - name: session_type
>     size: fixed(5)
>     population: uniform(1..4)
>     cluster: uniform(1..4)
>   - name: user_id
>     size: fixed(15)
>     population: uniform(1..1000000)
>   - name: message
>     size: uniform(10..100)
>     population: uniform(1..100B)
> 
> insert:
>   partitions: fixed(1)
>   batchtype: UNLOGGED
>   select: fixed(1)/1200000
> 
> 
> Running stress tool for just the insert prints 
> 
> Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000] total rows
in the partitions)
> 
> and then immediately starts flooding me with "com.datastax.driver.core.exceptions.InvalidQueryException:
Batch too large”. 
> 
> Why I should be exceeding the "batch_size_fail_threshold_in_kb: 50” in the cassandra.yaml
I do not understand. My understanding is that the stress tool should generate one row per
batch. The size of a single row should not exceed 8+10*3+5*3+15*3+100*3 = 398 bytes. Assuming
a worst case of all text characters being 3 byte unicode characters. 
> 
> How come I end up with batches that exceed the 50kb threshold? Am I missing the point
about the “select” attribute?
> 
> 
> Thanks!
> Ralf
> 
> 
> 
> -- 
> http://twitter.com/tjake <http://twitter.com/tjake>

Mime
View raw message