On Sat, Apr 21, 2012 at 1:05 AM, Jake Luciani <firstname.lastname@example.org> wrote:
What other solutions are you considering? Any OLTP style access of 200TB of data will require substantial IO.
Do you know how big your working dataset will be?
-Jake--On Fri, Apr 20, 2012 at 3:30 AM, Franc Carter <email@example.com> wrote:On Fri, Apr 20, 2012 at 6:27 AM, aaron morton <firstname.lastname@example.org> wrote:Couple of ideas:* take a look at compression in 1.X http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression* is there repetition in the binary data ? Can you save space by implementing content addressable storage ?
The data is already very highly space optimised. We've come to the conclusion that Cassandra is probably not the right fit the use case this time
CheersOn 20/04/2012, at 12:55 AM, Dave Brosius wrote:I think your math is 'relatively' correct. It would seem to me you should focus on how you can reduce the amount of storage you are using per item, if at all possible, if that node count is prohibitive.
On 04/19/2012 07:12 AM, Franc Carter wrote:Hi,
One of the projects I am working on is going to need to store about 200TB of data - generally in manageable binary chunks. However, after doing some rough calculations based on rules of thumb I have seen for how much storage should be on each node I'm worried.
200TB with RF=3 is 600TB = 600,000GBWhich is 1000 nodes at 600GB per node
I'm hoping I've missed something as 1000 nodes is not viable for us.
Franc Carter | Systems architect | Sirca Ltd
email@example.com | www.sirca.org.au
Tel: +61 2 9236 9118
Level 9, 80 Clarence St, Sydney NSW 2000
PO Box H58, Australia Square, Sydney NSW 1215