cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <>
Subject Re: Data Node Density
Date Fri, 15 Dec 2017 15:48:39 GMT
Typing this on a phone during my commute, please excuse the inevitable typos in what I expect
will be a long email because there’s nothing else for me to do right now. 

There’s a few reasons people don’t typically recommend huge nodes, the biggest reason
being expansion and replacement. This question comes up from time to time, so here’s at
least one other explanation I’ve written in the past:

Streaming (the mechanism for bootstrap / rebuild / repair) doesn’t have a ton of retries
built in. The larger the amount of data to stream, the more opportunities there are for failures.
Streaming a terabyte probably succeeds just fine 99% of the time, 60TB probably much lower.
In 2.2 and newer, resumable bootstrap makes this slightly less of a concern (assuming it’s
implemented correctly).

There’s also some internals in play. When you bootstrap a new node, we create a streaming
plan. To create that, we need to inspect all of the data files on disk, figure which files
transfer, figure out how much actual data that is (which involves interacting with the compression
info), queue them up, send them, where the other side compresses it again, recalculated metadata,
and writes it to disk

The compression/metadata runs single threaded per stream, so you’re typically bound by the
performance of the number streams, which correlates to the number of sending hosts. If you
use vnodes, you can set the number of vnodes near how many cores/machines you’ll have, so
you end up with approximately as many streams as cores.

 If you’ve already bought the hardware, you can try to make it work. You’ll need the heap
to be big enough to calculate the streaming plans, and you’ll want to think about how you
lay out the data directories (for JBOD to be safe you’ll need to be on 3.11, otherwise just
raid0 it). Alternatively, as someone mentioned on this list in the past few weeks,  you can
try to add some extra IPs and run more than one Cassandra instance per host - doing so let’s
you treat each of them as a smaller instance. If you do this you’ll need to use rack awareness
to make sure you don’t have multiple copies of data on the same machine, or a single hardware
failure could make you lose data.

If you’re having specific problems trying to run a rebuild or bootstrap, you may have better
luck with subrange repair - you’ll stream less data, and you can do it in very small chunks.
Most importantly, if you’re having specific problems, don’t ask us if it works, tell us
what’s failing and show us the errors. 

Having an outside firm come in and help explain and troubleshoot this for you is probably
a good idea. The firms I’d personally trust if you were a close relative of mine asking
for help are TheLastPickle and Instaclustr, but there’s also some very competent people
at Pythian and

Jeff Jirsa

> On Dec 15, 2017, at 6:37 AM, Amit Agrawal <> wrote:
> Thanks Nicholas. Am aware of the official recommendations. However, in the last project,
we tried with 5 TB and it worked fine. 
> So asking for expereinces around.
> Anybody knows anyone who provides a consultancy on open source cassandra. Datastax just
does it for the enterprise version! 
>> On Fri, Dec 15, 2017 at 3:08 PM, Nicolas Guyomar <>
>> Hi Amit,
>> This is way too much data per node, official recommendation are to try to stay below
2Tb per node, I have seen nodes up to 4Tb but then maintenance gets really complicated (backup,
boostrap, streaming for repair etc etc)
>> Nicolas
>>> On 15 December 2017 at 15:01, Amit Agrawal <>
>>> Hi,
>>> We are trying to setup a 3 node cluster with 20 TB HD on each node. 
>>> its a bare metal setup with 44 cores on each node. 
>>> So in total 60 TB, 66 cores , 3 node cluster.
>>> The data velocity is very less, low access rates. 
>>> has anyone tried with this configuration ?
>>> A bit urgent. 
>>> Regards,
>>> -A

View raw message