cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Assigned] (CASSANDRA-7552) Compactions Pending build up when using LCS
Date Tue, 15 Jul 2014 21:23:05 GMT


Jonathan Ellis reassigned CASSANDRA-7552:

    Assignee: Yuki Morishita

> Compactions Pending build up when using LCS
> -------------------------------------------
>                 Key: CASSANDRA-7552
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Darla Baker
>            Assignee: Yuki Morishita
> We seem to be hitting an issue with LeveledCompactionStrategy while running performance
tests on a 4 node cassandra installation. We are currently using Cassandra 2.0.7.
> In summary, we run a tests consisting of approximatively, 8000 inserts/sec, 16,000 gets/sec,
and 8,000 deletes/sec. We have a grace period of 12 hours on our column families.
> At this rate, we observe a stable pending compaction tasks for about 22 to 26 hours.
After that period, something happens and the pending compaction tasks starts to increase rapidly,
sometimes on one or two servers, but sometimes on all four of them. This goes on until the
uncompacted SStables start consuming all the disk space, after which the cassandra cluster
generally fails.
> When this occurs, the Compaction completed tasks rate is usually reducing over time,
which seems to indicate that it takes more and more time to run the existing compaction tasks.
> At different occasions, I can reproduce a similar issue in less than 12 hours. While
the traffic rate remains constant, we seem to be hitting this at various intervals. Yesterday
I could reproduce in less than 6 hours.
> We have two different deployments on which we have tested this issue: 
> 1. 4x IBM HS22, using RAMDISK as cassandra data directory (thus eliminating disk I/O)

> 2. 8x IBM HS23, with SSD disks, deployed in two "geo-redundant" data centers of 4 nodes
each, and a latency of 50ms between the data centers.
> I can reproduce the "compaction tasks falling behind" on both these setup, although they
could be occurring for different reasons. Because of #1, I do not believe we are hitting an
I/O bottleneck just yet.
> As an additional interesting node, if I artificially pause the traffic when I see the
pending compaction task issue occurring, then: 
> 1. The pending compaction tasks obviously stops to increase, but stay at the same number
for 15 minutes (as if nothing is running). 
> 2. The completed compaction tasks falls to 0 for 15 minutes 
> 3. After 15 to 20 minutes, out of the blue, all compaction completes in less than 2 minutes.
> If I restart the traffic after that, the system is stable for a few hours, but the issue
always comes back.
> We have written a small test tool that reproduce our application's Cassandra interaction.
> We have not successfully run a test for more than 30 hours under load, and every failure
after that time would follow a similar pattern.

This message was sent by Atlassian JIRA

View raw message