incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yatong Zhang <bluefl...@gmail.com>
Subject Re: Really need some advices on large data considerations
Date Wed, 14 May 2014 01:13:49 GMT
Thank you Aaron, but we're planning about 20T per node, is that feasible?


On Mon, May 12, 2014 at 4:33 PM, Aaron Morton <aaron@thelastpickle.com>wrote:

> We've learned that compaction strategy would be an important point cause
> we've ran into 'no space' trouble because of the 'sized tiered'  compaction
> strategy.
>
> If you want to get the most out of the raw disk space LCS is the way to
> go, remember it uses approximately twice the disk IO.
>
> From our experience changing any settings/schema during a large cluster is
> on line and has been running for some time is really really a pain.
>
> Which parts in particular ?
>
> Updating the schema or config ? OpsCentre has a rolling restart feature
> which can be handy when chef / puppet is deploying the config changes.
> Schema / gossip can take a little to propagate with high number of nodes.
>
> On a modern version you should be able to run 2 to 3 TB per node, maybe
> higher. The biggest concerns are going to be repair (the changes in 2.1
> will help) and bootstrapping. I’d recommend testing a smaller cluster, say
> 12 nodes, with a high load per node 3TB.
>
> cheers
> Aaron
>
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 9/05/2014, at 12:09 pm, Yatong Zhang <blueflycn@gmail.com> wrote:
>
> Hi,
>
> We're going to deploy a large Cassandra cluster in PB level. Our scenario
> would be:
>
> 1. Lots of writes, about 150 writes/second at average, and about 300K size
> per write.
> 2. Relatively very small reads
> 3. Our data will be never updated
> 4. But we will delete old data periodically to free space for new data
>
> We've learned that compaction strategy would be an important point cause
> we've ran into 'no space' trouble because of the 'sized tiered'  compaction
> strategy.
>
> We've read http://wiki.apache.org/cassandra/LargeDataSetConsiderationsand is this enough
or update-to-date? From our experience changing any
> settings/schema during a large cluster is on line and has been running for
> some time is really really a pain. So we're gathering more info and
> expecting some more practical suggestions before we set up  the cassandra
> cluster.
>
> Thanks and any help is of great appreciation
>
>
>

Mime
View raw message