cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lerh Chuan Low <l...@instaclustr.com>
Subject Re: Bootstrapping node on Cassandra 3.7 causes cluster-wide performance issues
Date Tue, 12 Sep 2017 03:12:37 GMT
Hi Paul,

Agh, I don't have any experience with sstableofflinerelevel. Maybe Kurt
does, sorry.

Also, if it wasn't obvious, to add back the node to the cluster once it is
done would be the 3 commands, with enable substituted for disable. It feels
like it will take some time to get through all the compactions, likely more
than the hinted handoff window, so do make sure you are querying Cassandra
with strong consistency after you rejoin the node. Good luck!

Lerh

On 12 September 2017 at 11:53, Aaron Wykoff <infrastructureguru@gmail.com>
wrote:

> Unsubscribe
>
> On Mon, Sep 11, 2017 at 4:48 PM, Paul Pollack <paul.pollack@klaviyo.com>
> wrote:
>
>> Hi,
>>
>> We run 48 node cluster that stores counts in wide rows. Each node is
>> using roughly 1TB space on a 2TB EBS gp2 drive for data directory and
>> LeveledCompactionStrategy. We have been trying to bootstrap new nodes that
>> use a raid0 configuration over 2 1TB EBS drives to increase I/O throughput
>> cap from 160 MB/s to 250 MB/s (AWS limits). Every time a node finishes
>> streaming it is bombarded by a large number of compactions. We see CPU load
>> on the new node spike extremely high and CPU load on all the other nodes in
>> the cluster drop unreasonably low. Meanwhile our app's latency for writes
>> to this cluster average 10 seconds or greater. We've already tried
>> throttling compaction throughput to 1 mbps and we've always had
>> concurrent_compactors set to 2 but the disk is still saturated. In every
>> case we have had to shut down the Cassandra process on the new node to
>> resume acceptable operations.
>>
>> We're currently upgrading all of our clients to use the 3.11.0 version of
>> the DataStax Python driver, which will allow us to add our next newly
>> bootstrapped node to a blacklist, hoping that if it doesn't accept writes
>> the rest of the cluster can serve them adequately (as is the case whenever
>> we turn down the bootstrapping node), and allow it to finish its
>> compactions.
>>
>> We were also interested in hearing if anyone has had much luck using the
>> sstableofflinerelevel tool, and if this is a reasonable approach for our
>> issue.
>>
>> One of my colleagues found a post where a user had a similar issue and
>> found that bloom filters had an extremely high false positive ratio, and
>> although I didn't check that during any of these attempts to bootstrap it
>> seems to me like if we have that many compactions to do we're likely to
>> observe that same thing.
>>
>> Would appreciate any guidance anyone can offer.
>>
>> Thanks,
>> Paul
>>
>
>

Mime
View raw message