incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Theo Hultberg <t...@iconara.net>
Subject Re: Performance issues with CQL3 collections?
Date Fri, 28 Jun 2013 05:30:32 GMT
the thing I was doing was definitely triggering the range tombstone issue,
this is what I was doing:

    UPDATE clocks SET clock = ? WHERE shard = ?

in this table:

    CREATE TABLE clocks (shard INT PRIMARY KEY, clock MAP<TEXT, TIMESTAMP>)

however, from the stack overflow posts it sounds like they aren't
necessarily overwriting their collections. I've tried to replicate their
problem with these two statements

    INSERT INTO clocks (shard, clock) VALUES (?, ?)
    UPDATE clocks SET clock = clock + ? WHERE shard = ?

the first one should create range tombstones because it overwrites the the
map on every insert, and the second should not because it adds to the map.
neither of those seems to have any performance issues, at least not on
inserts.

and it's the slowdown on inserts that confuses me, both the stack overflow
questioners say that they saw a drop in insert performance. I never saw
that in my application, I just got slow reads (and Fabien's explanation
makes complete sense for that). I don't understand how insert performance
could be affected at all, and I know that for non-counter columns cassandra
doesn't read before it writes, but is it the same for collections too? they
are a bit special, but how special are they?

T#


On Fri, Jun 28, 2013 at 7:04 AM, aaron morton <aaron@thelastpickle.com>wrote:

> Can you provide details of the mutation statements you are running ? The
> Stack Overflow posts don't seem to include them.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 27/06/2013, at 5:58 AM, Theo Hultberg <theo@iconara.net> wrote:
>
> do I understand it correctly if I think that collection modifications are
> done by reading the collection, writing a range tombstone that would cover
> the collection and then re-writing the whole collection again? or is it
> just the modified parts of the collection that are covered by the range
> tombstones, but you still get massive amounts of them and its just their
> number that is the problem.
>
> would this explain the slowdown of writes too? I guess it would if
> cassandra needed to read the collection before it wrote the new values,
> otherwise I don't understand how this affects writes, but that only says
> how much I know about how this works.
>
> T#
>
>
> On Wed, Jun 26, 2013 at 10:48 AM, Fabien Rousseau <fabien@yakaz.com>wrote:
>
>> Hi,
>>
>> I'm pretty sure that it's related to this ticket :
>> https://issues.apache.org/jira/browse/CASSANDRA-5677
>>
>> I'd be happy if someone tests this patch.
>> It should apply easily on 1.2.5 & 1.2.6
>>
>> After applying the patch, by default, the current implementation is still
>> used, but modify your cassandra.yaml to add the following one :
>> interval_tree_provider: IntervalTreeAvlProvider
>>
>> (Note that implementations should be interchangeable, because they share
>> the same serializers and deserializers)
>>
>> Also, please note that this patch has not been reviewed nor intensively
>> tested... So, it may not be "production ready"
>>
>> Fabien
>>
>>
>>
>>
>>
>>
>>
>> 2013/6/26 Theo Hultberg <theo@iconara.net>
>>
>>> Hi,
>>>
>>> I've seen a couple of people on Stack Overflow having problems with
>>> performance when they have maps that they continuously update, and in
>>> hindsight I think I might have run into the same problem myself (but I
>>> didn't suspect it as the reason and designed differently and by accident
>>> didn't use maps anymore).
>>>
>>> Is there any reason that maps (or lists or sets) in particular would
>>> become a performance issue when they're heavily modified? As I've
>>> understood them they're not special, and shouldn't be any different
>>> performance wise than overwriting regular columns. Is there something
>>> different going on that I'm missing?
>>>
>>> Here are the Stack Overflow questions:
>>>
>>>
>>> http://stackoverflow.com/questions/17282837/cassandra-insert-perfomance-issue-into-a-table-with-a-map-type/17290981
>>>
>>>
>>> http://stackoverflow.com/questions/17082963/bad-performance-when-writing-log-data-to-cassandra-with-timeuuid-as-a-column-nam/17123236
>>>
>>> yours,
>>> Theo
>>>
>>
>>
>>
>> --
>> Fabien Rousseau
>> *
>> *
>>  <aurore@yakaz.com>www.yakaz.com
>>
>
>
>

Mime
View raw message