cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Stevens <migh...@gmail.com>
Subject Re: Cassandra 2.0 Batch Statement for timeseries schema
Date Thu, 05 Nov 2015 15:50:46 GMT
If you're talking about logged batches, these absolutely have an impact on
performance of about 30%.  The whole batch will succeed or fail as a unit,
but throughput will go down and load will go up.  Keep in mind that logged
batches are atomic but are not isolated - i.e. it's totally possible to get
a dirty read.  See
http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2

If you're not doing some kind of CAS operation inside the logged batch,
then the only advantage of a logged batch over an unlogged batch is that
when consistency can't be accomplished for the second statement (so it
fails the write), then the first statement will also not succeed (but at
that point your cluster is effectively offline).

Unlogged batches offer very few guarantees over single statements, and even
have the drawback of eliminating your driver's ability to operate in a
token aware fashion.

On Thu, Nov 5, 2015 at 8:22 AM Sachin Nikam <sknikam@gmail.com> wrote:

> I currently have a keyspace with table definition that looks like this.
>
>
> CREATE TABLE *orders*(
>   order-id long PRIMARY KEY,
>   order-blob text
> );
>
> This table will have a write load of ~40-100 tps and a read load of ~200-400 tps.
>
> We are now considering adding another table definition which closely resembles a timeseries
table.
>
> CREATE TABLE order_sequence(
> //shard-id will be generated by order-id%Number of Nodes in //Cassandra Ring. It will
be then suffixed with Current //Date. An Example would be 2-Nov-11-2015
>
>   shard-and-date text,
>
> //This will be a simple flake generated long
>   sequence-id long
>   PRIMARY KEY (shard-and-date, sequence-id)
> )WITH CLUSTERING ORDER BY (sequence-id DESC);
>
>
> The goal of this table is to answer queries like "Get me the count of orders changed
in a given sequence-id range". This query will be called once every 5 sec.
>
> The plan is to write both these tables in a single BATCH statement.
>
> 1. Will this impact the WRite latency?
>
> 2. Also will it impact Read latency of "orders" table?
>
> 3. Will it impact the overall stability of the cluster?
>
>

Mime
View raw message