ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Rudyak <irud...@gmail.com>
Subject Re: Batch support in Cassandra store
Date Wed, 27 Jul 2016 01:16:17 GMT
Dmitriy,

There is absolutely same approach for all async read/write/delete
operations - Cassandra session just provides executeAsync(statement) function
for all type of operations.

To be more detailed about Cassandra batches, there are actually two types
of batches:

1) *Logged batch* (aka atomic) - the main purpose of such batches is to
keep duplicated data in sync while updating multiple tables, but at the
cost of performance.

2) *Unlogged batch* - the only specific case for such batch is when all
updates are addressed to only *one* partition key and batch having "*reasonable
size*". In a such situation there *could be* performance benefits if you
are using Cassandra *TokenAware* load balancing policy. In this particular
case all the updates will go directly without any additional
coordination to the primary node, which is responsible for storing data for
this partition key.

The *generic rule* is that - *individual updates using async mode* provides
the best performance (
https://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html). That's
because it spread all updates across the whole cluster. In contrast to
this, when you are using batches, what this is actually doing is putting a
huge amount of pressure on a single coordinator node. This is because the
coordinator needs to forward each individual insert/update/delete to the
correct replicas. In general you're just losing all the benefit of
Cassandra TokenAware load balancing policy when you're updating different
partitions in a single round trip to the database.

Probably the only enhancement which could be done is to separate our batch
to smaller batches, each of which is updating records having the same
partition key. In this case it could provide some performance benefits when
used in combination with Cassandra TokenAware policy. But there are several
concerns:

1) It looks like rather rare case
2) Makes error handling more complex - you just don't know what operations
in a batch succeed and what failed and need to retry all batch
3) Retry logic could produce more load on the cluster - in case of
individual updates you just need to retry the only mutations which are
failed, in case of batches you need to retry the whole batch
4)* Unlogged batch is deprecated in Cassandra 3.0* (
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/batch_r.html), which
we are currently using for Ignite Cassandra module.


Igor Rudyak



On Tue, Jul 26, 2016 at 4:45 PM, Dmitriy Setrakyan <dsetrakyan@apache.org>
wrote:

>
>
> On Tue, Jul 26, 2016 at 5:53 PM, Igor Rudyak <irudyak@gmail.com> wrote:
>
>> Hi Valentin,
>>
>> For writeAll/readAll Cassandra cache store implementation uses async
>> operations (http://www.datastax.com/dev/blog/java-driver-async-queries)
>> and
>> futures, which has the best characteristics in terms of performance.
>>
>>
> Thanks, Igor. This link describes the query operations, but I could not
> find the mention of writes.
>
>
>> Cassandra BATCH statement is actually quite often anti-pattern for those
>> who come from relational world. BATCH statement concept in Cassandra is
>> totally different from relational world and is not for optimizing
>> batch/bulk operations. The main purpose of Cassandra BATCH is to keep
>> denormalized data in sync. For example when you duplicating the same data
>> into several tables. All other cases are not recommended for Cassandra
>> batches:
>>  -
>>
>> https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.k4xfir8ij
>>  -
>>
>> http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html
>>  - https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/
>>
>> It's also good to mention that in CassandraCacheStore implementation
>> (actually in CassandraSessionImpl) all operation with Cassandra is wrapped
>> in a loop. The reason is in a case of failure it will be performed 20
>> attempts to retry the operation with incrementally increasing timeouts
>> starting from 100ms and specific exception handling logic (Cassandra hosts
>> unavailability and etc.). Thus it provides quite reliable persistence
>> mechanism. According to load tests, even on heavily overloaded Cassandra
>> cluster (CPU LOAD > 10 per one core) there were no lost
>> writes/reads/deletes and maximum 6 attempts to perform one operation.
>>
>
> I think that the main point about Cassandra batch operations is not about
> reliability, but about performance. If user batches up 100s of updates in 1
> Cassandra batch, then it will be a lot faster than doing them 1-by-1 in
> Ignite. Wrapping them into Ignite "putAll(...)" call just seems more
> logical to me, no?
>
>
>>
>> Igor Rudyak
>>
>> On Tue, Jul 26, 2016 at 1:58 PM, Valentin Kulichenko <
>> valentin.kulichenko@gmail.com> wrote:
>>
>> > Hi Igor,
>> >
>> > I noticed that current Cassandra store implementation doesn't support
>> > batching for writeAll and deleteAll methods, it simply executes all
>> updates
>> > one by one (asynchronously in parallel).
>> >
>> > I think it can be useful to provide such support and created a ticket
>> [1].
>> > Can you please give your input on this? Does it make sense in your
>> opinion?
>> >
>> > [1] https://issues.apache.org/jira/browse/IGNITE-3588
>> >
>> > -Val
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message