ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Rudyak <irud...@gmail.com>
Subject Re: Batch support in Cassandra store
Date Sat, 30 Jul 2016 01:31:47 GMT
Hi Valentin,

Sounds reasonable. I'll create a ticket to add Cassandra logged batches and
will try to prepare some load tests to investigate if unlogged batches can
provide better performance. Will also add ticket for RAMP as a long term
enhancement.

Igor Rudyak

On Fri, Jul 29, 2016 at 5:45 PM, Valentin Kulichenko <
valentin.kulichenko@gmail.com> wrote:

> Hi Igor,
>
> 1) Yes, I'm talking about splitting the entry set into per-partition (or
> per-node) batches. Having entries that are stores on different nodes in the
> same batch doesn't make much sense, of course.
>
> 2) RAMP looks interesting, but it seems to be a pretty complicated task.
> How about adding the support for built-in logged batches (this should be
> fairly easy to implement) and then improve the atomicity as a second phase?
>
> -Val
>
> On Fri, Jul 29, 2016 at 5:19 PM, Igor Rudyak <irudyak@gmail.com> wrote:
>
>> Hi Valentin,
>>
>> 1) According unlogged batches I think it doesn't make sense to support
>> them, cause:
>> - They are deprecated starting from Cassandra 3.0 (which we are currently
>> using in Cassandra module)
>> - According to Cassandra documentation (
>> http://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html)
>> "Batches are often mistakenly used in an attempt to optimize performance".
>> Cassandra guys saying that no batches (
>> https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.rxkmfe209)
>> is the fastest way to load data. I checked it with the batches having
>> records with different partition keys and it's definitely true. For small
>> batch of records having all the same partition key (affinity in Ignite)
>> they could provide better performance, but I didn't investigated this case
>> deeply (what is the optimal size of a batch, how significantly is the
>> performance benefits and etc.) Can try to do some load tests to have better
>> understanding of this.
>>
>> 2) Regarding logged batches I think that it makes sense to support them
>> in Cassandra module for transactional caches. The bad thing is that they
>> don't provide isolation, the good thing is they guaranty that all your
>> changes will be eventually committed and visible to clients. Thus it's
>> still better than nothing... However there is a better approach for this.
>> We can implement transactional protocol on top of Cassandra, which will
>> give us atomic read isolation - you'll either see all the changes made by
>> transaction or none of them. For example we can implement RAMP transactions(
>> http://www.bailis.org/papers/ramp-sigmod2014.pdf) cause it provides
>> rather low overhead.
>>
>> Igor Rudyak
>>
>> On Thu, Jul 28, 2016 at 11:00 PM, Valentin Kulichenko <
>> valentin.kulichenko@gmail.com> wrote:
>>
>>> Hi Igor,
>>>
>>> I'm not a big Cassandra expert, but here are my thoughts.
>>>
>>> 1. Sending updates in a batch is always better than sending them one by
>>> one. For example, if you do putAll in Ignite with 100 entries, and these
>>> entries are split across 5 nodes, the client will send 5 requests instead
>>> of 100. This provides significant performance improvement. Is there a way
>>> to use similar approach in Cassandra?
>>> 2. As for logged batches, I can easily believe that this is a rarely
>>> used feature, but since it exists in Cassandra, I can't find a single
>>> reason why not to support it in our store as an option. Users that come
>>> across those rare cases, will only say thank you to us :)
>>>
>>> What do you think?
>>>
>>> -Val
>>>
>>> On Thu, Jul 28, 2016 at 10:41 PM, Igor Rudyak <irudyak@gmail.com> wrote:
>>>
>>>> There are actually some cases when atomic read isolation in Cassandra
>>>> could
>>>> be important. Lets assume batch was persisted in Cassandra, but not
>>>> finalized yet - read operation from Cassandra returns us only partially
>>>> committed data of the batch. In the such situation we have problems
>>>> when:
>>>>
>>>> 1) Some of the batch records already expired from Ignite cache and we
>>>> reading them from persistent store (Cassandra in our case).
>>>>
>>>> 2) All Ignite nodes storing the batch records (or subset records) died
>>>> (or
>>>> for example became unavailable for 10sec because of network problem).
>>>> While
>>>> reading such records from Ignite cache we will be redirected to
>>>> persistent
>>>> store.
>>>>
>>>> 3) Network separation occurred such a way that we now have two Ignite
>>>> cluster, but all the replicas of the batch data are located only in one
>>>> of
>>>> these clusters. Again while reading such records from Ignite cache on
>>>> the
>>>> second cluster we will be redirected to persistent store.
>>>>
>>>> In all mentioned cases, if Cassandra batch isn't finalized yet - we will
>>>> read partially committed transaction data.
>>>>
>>>>
>>>> On Thu, Jul 28, 2016 at 6:52 AM, Luiz Felipe Trevisan <
>>>> luizfelipe.trevisan@gmail.com> wrote:
>>>>
>>>> > I totally agree with you regarding the guarantees we have with logged
>>>> > batches and I'm also pretty much aware of the performance penalty
>>>> involved
>>>> > using this solution.
>>>> >
>>>> > But since all read operations are executed via ignite it means that
>>>> > isolation in the Cassandra level is not really important. I think the
>>>> only
>>>> > guarantee really needed is that we don't end up with a partial insert
>>>> in
>>>> > Cassandra in case we have a failure in ignite and we loose the node
>>>> that
>>>> > was responsible for this write operation.
>>>> >
>>>> > My other assumption is that the write operation needs to finish
>>>> before an
>>>> > eviction happens for this entry and we loose the data in cache (since
>>>> batch
>>>> > doesn't guarantee isolation). However if we cannot achieve this I
>>>> don't see
>>>> > why use ignite as a cache store.
>>>> >
>>>> > Luiz
>>>> >
>>>> > --
>>>> > Luiz Felipe Trevisan
>>>> >
>>>> > On Wed, Jul 27, 2016 at 4:55 PM, Igor Rudyak <irudyak@gmail.com>
>>>> wrote:
>>>> >
>>>> >> Hi Luiz,
>>>> >>
>>>> >> Logged batches is not the solution to achieve atomic view of your
>>>> Ignite
>>>> >> transaction changes in Cassandra.
>>>> >>
>>>> >> The problem with logged batches(aka atomic) is they guarantees that
>>>> if
>>>> >> any part of the batch succeeds, all of it will, no other
>>>> transactional
>>>> >> enforcement is done at the batch level. For example, there is no
>>>> batch
>>>> >> isolation. Clients are able to read the first updated rows from
the
>>>> batch,
>>>> >> while other rows are still being updated on the server (in RDBMS
>>>> >> terminology it means *READ-UNCOMMITED* isolation level). Thus
>>>> Cassandra
>>>>
>>>> >> mean "atomic" in the database sense that if any part of the batch
>>>> succeeds,
>>>> >> all of it will.
>>>> >>
>>>> >> Probably the best way to archive read atomic isolation for Ignite
>>>> >> transaction persisting data into Cassandra, is to implement RAMP
>>>> >> transactions (http://www.bailis.org/papers/ramp-sigmod2014.pdf)
on
>>>> top
>>>> >> of Cassandra.
>>>> >>
>>>> >> I may create a ticket for this if community would like it.
>>>> >>
>>>> >>
>>>> >> Igor Rudyak
>>>> >>
>>>> >>
>>>> >> On Wed, Jul 27, 2016 at 12:55 PM, Luiz Felipe Trevisan <
>>>> >> luizfelipe.trevisan@gmail.com> wrote:
>>>> >>
>>>> >>> Hi Igor,
>>>> >>>
>>>> >>> Does it make sense for you using logged batches to guarantee
>>>> atomicity
>>>> >>> in Cassandra in cases we are doing a cross cache transaction
>>>> operation?
>>>> >>>
>>>> >>> Luiz
>>>> >>>
>>>> >>> --
>>>> >>> Luiz Felipe Trevisan
>>>> >>>
>>>> >>> On Wed, Jul 27, 2016 at 2:05 AM, Dmitriy Setrakyan <
>>>> >>> dsetrakyan@apache.org> wrote:
>>>> >>>
>>>> >>>> I am very confused still. Ilya, can you please explain what
>>>> happens in
>>>> >>>> Cassandra if user calls IgniteCache.putAll(...) method?
>>>> >>>>
>>>> >>>> In Ignite, if putAll(...) is called, Ignite will make the
best
>>>> effort to
>>>> >>>> execute the update as a batch, in which case the performance
is
>>>> better.
>>>> >>>> What is the analogy in Cassandra?
>>>> >>>>
>>>> >>>> D.
>>>> >>>>
>>>> >>>> On Tue, Jul 26, 2016 at 9:16 PM, Igor Rudyak <irudyak@gmail.com>
>>>> wrote:
>>>> >>>>
>>>> >>>> > Dmitriy,
>>>> >>>> >
>>>> >>>> > There is absolutely same approach for all async read/write/delete
>>>> >>>> > operations - Cassandra session just provides
>>>> executeAsync(statement)
>>>> >>>> > function
>>>> >>>> > for all type of operations.
>>>> >>>> >
>>>> >>>> > To be more detailed about Cassandra batches, there
are actually
>>>> two
>>>> >>>> types
>>>> >>>> > of batches:
>>>> >>>> >
>>>> >>>> > 1) *Logged batch* (aka atomic) - the main purpose of
such
>>>> batches is
>>>> >>>> to
>>>> >>>> > keep duplicated data in sync while updating multiple
tables, but
>>>> at
>>>> >>>> the
>>>> >>>> > cost of performance.
>>>> >>>> >
>>>> >>>> > 2) *Unlogged batch* - the only specific case for such
batch is
>>>> when
>>>> >>>> all
>>>> >>>> > updates are addressed to only *one* partition key and
batch
>>>> having
>>>> >>>> > "*reasonable
>>>> >>>> > size*". In a such situation there *could be* performance
>>>> benefits if
>>>> >>>> you
>>>> >>>> > are using Cassandra *TokenAware* load balancing policy.
In this
>>>> >>>> particular
>>>> >>>> > case all the updates will go directly without any additional
>>>> >>>> > coordination to the primary node, which is responsible
for
>>>> storing
>>>> >>>> data for
>>>> >>>> > this partition key.
>>>> >>>> >
>>>> >>>> > The *generic rule* is that - *individual updates using
async
>>>> mode*
>>>> >>>> provides
>>>> >>>> > the best performance (
>>>> >>>> > https://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html
>>>> ).
>>>> >>>> That's
>>>> >>>> > because it spread all updates across the whole cluster.
In
>>>> contrast to
>>>> >>>> > this, when you are using batches, what this is actually
doing is
>>>> >>>> putting a
>>>> >>>> > huge amount of pressure on a single coordinator node.
This is
>>>> because
>>>> >>>> the
>>>> >>>> > coordinator needs to forward each individual
>>>> insert/update/delete to
>>>> >>>> the
>>>> >>>> > correct replicas. In general you're just losing all
the benefit
>>>> of
>>>> >>>> > Cassandra TokenAware load balancing policy when you're
updating
>>>> >>>> different
>>>> >>>> > partitions in a single round trip to the database.
>>>> >>>> >
>>>> >>>> > Probably the only enhancement which could be done is
to separate
>>>> our
>>>> >>>> batch
>>>> >>>> > to smaller batches, each of which is updating records
having the
>>>> same
>>>> >>>> > partition key. In this case it could provide some performance
>>>> >>>> benefits when
>>>> >>>> > used in combination with Cassandra TokenAware policy.
But there
>>>> are
>>>> >>>> several
>>>> >>>> > concerns:
>>>> >>>> >
>>>> >>>> > 1) It looks like rather rare case
>>>> >>>> > 2) Makes error handling more complex - you just don't
know what
>>>> >>>> operations
>>>> >>>> > in a batch succeed and what failed and need to retry
all batch
>>>> >>>> > 3) Retry logic could produce more load on the cluster
- in case
>>>> of
>>>> >>>> > individual updates you just need to retry the only
mutations
>>>> which are
>>>> >>>> > failed, in case of batches you need to retry the whole
batch
>>>> >>>> > 4)* Unlogged batch is deprecated in Cassandra 3.0*
(
>>>> >>>> >
>>>> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/batch_r.html),
>>>> >>>> > which
>>>> >>>> > we are currently using for Ignite Cassandra module.
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > Igor Rudyak
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > On Tue, Jul 26, 2016 at 4:45 PM, Dmitriy Setrakyan
<
>>>> >>>> dsetrakyan@apache.org>
>>>> >>>> > wrote:
>>>> >>>> >
>>>> >>>> > >
>>>> >>>> > >
>>>> >>>> > > On Tue, Jul 26, 2016 at 5:53 PM, Igor Rudyak <
>>>> irudyak@gmail.com>
>>>> >>>> wrote:
>>>> >>>> > >
>>>> >>>> > >> Hi Valentin,
>>>> >>>> > >>
>>>> >>>> > >> For writeAll/readAll Cassandra cache store
implementation uses
>>>> >>>> async
>>>> >>>> > >> operations (
>>>> >>>> http://www.datastax.com/dev/blog/java-driver-async-queries)
>>>> >>>> > >> and
>>>> >>>> > >> futures, which has the best characteristics
in terms of
>>>> >>>> performance.
>>>> >>>> > >>
>>>> >>>> > >>
>>>> >>>> > > Thanks, Igor. This link describes the query operations,
but I
>>>> could
>>>> >>>> not
>>>> >>>> > > find the mention of writes.
>>>> >>>> > >
>>>> >>>> > >
>>>> >>>> > >> Cassandra BATCH statement is actually quite
often
>>>> anti-pattern for
>>>> >>>> those
>>>> >>>> > >> who come from relational world. BATCH statement
concept in
>>>> >>>> Cassandra is
>>>> >>>> > >> totally different from relational world and
is not for
>>>> optimizing
>>>> >>>> > >> batch/bulk operations. The main purpose of
Cassandra BATCH is
>>>> to
>>>> >>>> keep
>>>> >>>> > >> denormalized data in sync. For example when
you duplicating
>>>> the
>>>> >>>> same
>>>> >>>> > data
>>>> >>>> > >> into several tables. All other cases are not
recommended for
>>>> >>>> Cassandra
>>>> >>>> > >> batches:
>>>> >>>> > >>  -
>>>> >>>> > >>
>>>> >>>> > >>
>>>> >>>> >
>>>> >>>>
>>>> https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.k4xfir8ij
>>>> >>>> > >>  -
>>>> >>>> > >>
>>>> >>>> > >>
>>>> >>>> >
>>>> >>>>
>>>> http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html
>>>> >>>> > >>  -
>>>> >>>>
>>>> https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/
>>>> >>>> > >>
>>>> >>>> > >> It's also good to mention that in CassandraCacheStore
>>>> >>>> implementation
>>>> >>>> > >> (actually in CassandraSessionImpl) all operation
with
>>>> Cassandra is
>>>> >>>> > wrapped
>>>> >>>> > >> in a loop. The reason is in a case of failure
it will be
>>>> performed
>>>> >>>> 20
>>>> >>>> > >> attempts to retry the operation with incrementally
increasing
>>>> >>>> timeouts
>>>> >>>> > >> starting from 100ms and specific exception
handling logic
>>>> >>>> (Cassandra
>>>> >>>> > hosts
>>>> >>>> > >> unavailability and etc.). Thus it provides
quite reliable
>>>> >>>> persistence
>>>> >>>> > >> mechanism. According to load tests, even on
heavily overloaded
>>>> >>>> Cassandra
>>>> >>>> > >> cluster (CPU LOAD > 10 per one core) there
were no lost
>>>> >>>> > >> writes/reads/deletes and maximum 6 attempts
to perform one
>>>> >>>> operation.
>>>> >>>> > >>
>>>> >>>> > >
>>>> >>>> > > I think that the main point about Cassandra batch
operations
>>>> is not
>>>> >>>> about
>>>> >>>> > > reliability, but about performance. If user batches
up 100s of
>>>> >>>> updates
>>>> >>>> > in 1
>>>> >>>> > > Cassandra batch, then it will be a lot faster
than doing them
>>>> >>>> 1-by-1 in
>>>> >>>> > > Ignite. Wrapping them into Ignite "putAll(...)"
call just
>>>> seems more
>>>> >>>> > > logical to me, no?
>>>> >>>> > >
>>>> >>>> > >
>>>> >>>> > >>
>>>> >>>> > >> Igor Rudyak
>>>> >>>> > >>
>>>> >>>> > >> On Tue, Jul 26, 2016 at 1:58 PM, Valentin
Kulichenko <
>>>> >>>> > >> valentin.kulichenko@gmail.com> wrote:
>>>> >>>> > >>
>>>> >>>> > >> > Hi Igor,
>>>> >>>> > >> >
>>>> >>>> > >> > I noticed that current Cassandra store
implementation
>>>> doesn't
>>>> >>>> support
>>>> >>>> > >> > batching for writeAll and deleteAll methods,
it simply
>>>> executes
>>>> >>>> all
>>>> >>>> > >> updates
>>>> >>>> > >> > one by one (asynchronously in parallel).
>>>> >>>> > >> >
>>>> >>>> > >> > I think it can be useful to provide such
support and
>>>> created a
>>>> >>>> ticket
>>>> >>>> > >> [1].
>>>> >>>> > >> > Can you please give your input on this?
Does it make sense
>>>> in
>>>> >>>> your
>>>> >>>> > >> opinion?
>>>> >>>> > >> >
>>>> >>>> > >> > [1] https://issues.apache.org/jira/browse/IGNITE-3588
>>>> >>>> > >> >
>>>> >>>> > >> > -Val
>>>> >>>> > >> >
>>>> >>>> > >>
>>>> >>>> > >
>>>> >>>> > >
>>>> >>>> >
>>>> >>>>
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message