cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleg Anastasyev (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6134) More efficient BatchlogManager
Date Wed, 02 Oct 2013 16:43:42 GMT


Oleg Anastasyev commented on CASSANDRA-6134:

Alex: It seems that current schema completely incompatible with new one. 
So, could you plz then look and decide is new batchlog manager useful for you, so it is worth
to implement migration.

> More efficient BatchlogManager
> ------------------------------
>                 Key: CASSANDRA-6134
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Oleg Anastasyev
>            Priority: Minor
>         Attachments: BatchlogManager.txt
> As we discussed earlier in CASSANDRA-6079 this is the new BatchManager.
> It stores batch records in 
> {code}
> CREATE TABLE batchlog (
>   id_partition int,
>   id timeuuid,
>   data blob,
>   PRIMARY KEY (id_partition, id)
> {code}
> where id_partition is minute-since-epoch of id uuid. 
> So when it scans for batches to replay ot scans within a single partition for  a slice
of ids since last processed date till now minus write timeout.
> So no full batchlog CF scan and lot of randrom reads are made on normal cycle. 
> Other improvements:
> 1. It runs every 1/2 of write timeout and replays all batches written within 0.9 * write
timeout from now. This way we ensure, that batched updates will be replayed to th moment client
times out from coordinator.
> 2. It submits all mutations from single batch in parallel (Like StorageProxy do). Old
implementation played them one-by-one, so client can see half applied batches in CF for a
long time (depending on size of batch).
> 3. It fixes a subtle racing bug with incorrect hint ttl calculation

This message was sent by Atlassian JIRA

View raw message