incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: Really need some advices on large data considerations
Date Mon, 12 May 2014 08:33:56 GMT
> We've learned that compaction strategy would be an important point cause we've ran into
'no space' trouble because of the 'sized tiered'  compaction strategy.
If you want to get the most out of the raw disk space LCS is the way to go, remember it uses
approximately twice the disk IO. 

> From our experience changing any settings/schema during a large cluster is on line and
has been running for some time is really really a pain.
Which parts in particular ? 

Updating the schema or config ? OpsCentre has a rolling restart feature which can be handy
when chef / puppet is deploying the config changes. Schema / gossip can take a little to propagate
with high number of nodes. 
 
On a modern version you should be able to run 2 to 3 TB per node, maybe higher. The biggest
concerns are going to be repair (the changes in 2.1 will help) and bootstrapping. I’d recommend
testing a smaller cluster, say 12 nodes, with a high load per node 3TB. 

cheers
Aaron
 
-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 9/05/2014, at 12:09 pm, Yatong Zhang <blueflycn@gmail.com> wrote:

> Hi,
> 
> We're going to deploy a large Cassandra cluster in PB level. Our scenario would be:
> 
> 1. Lots of writes, about 150 writes/second at average, and about 300K size per write.
> 2. Relatively very small reads
> 3. Our data will be never updated
> 4. But we will delete old data periodically to free space for new data
> 
> We've learned that compaction strategy would be an important point cause we've ran into
'no space' trouble because of the 'sized tiered'  compaction strategy.
> 
> We've read http://wiki.apache.org/cassandra/LargeDataSetConsiderations and is this enough
or update-to-date? From our experience changing any settings/schema during a large cluster
is on line and has been running for some time is really really a pain. So we're gathering
more info and expecting some more practical suggestions before we set up  the cassandra cluster.

> 
> Thanks and any help is of great appreciation


Mime
View raw message