Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7905595C9 for ; Fri, 20 Apr 2012 15:06:09 +0000 (UTC) Received: (qmail 22525 invoked by uid 500); 20 Apr 2012 15:06:07 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 22453 invoked by uid 500); 20 Apr 2012 15:06:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 22445 invoked by uid 99); 20 Apr 2012 15:06:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Apr 2012 15:06:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jakers@gmail.com designates 209.85.160.44 as permitted sender) Received: from [209.85.160.44] (HELO mail-pb0-f44.google.com) (209.85.160.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Apr 2012 15:06:02 +0000 Received: by pbbrp16 with SMTP id rp16so1204022pbb.31 for ; Fri, 20 Apr 2012 08:05:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=obpgrqLWTTDlEpUR3rAwVKIQx5ELd0+L35+0BQkFi6I=; b=Yp29LhAb+AXzbiG4tizWZGaz1B5kNsLOZWUiihN3WBkIzYfAsIXB8b/rv2EfJPmcFp ZI1ZucxBMBPpIkniOTn+KuOZ/3v3JheMzj+ZNM3Ox4LleSDC/LyB7WU9ZcsbHC81W3t+ MtJ8tvKb2Zww+qjxkxdo6SusaYAv3ZYJwA3Rty1ELuHQ0nLUqojQvueA1AGID78Pc+qo mQKyDzrNFKhtbdsXNx1KNwO2NwoM3/2KINSn/seQXHz81XrA7nHYYAl6/pPYH02i4FCp AJPYsQHJtmqDkoJbiELYhAYvgXD6U+Q8IeKw+eRACb6fthxHRieik8uuq929rIUM6WUA wTUA== Received: by 10.68.132.41 with SMTP id or9mr7003871pbb.30.1334934341706; Fri, 20 Apr 2012 08:05:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.233.73 with HTTP; Fri, 20 Apr 2012 08:05:21 -0700 (PDT) In-Reply-To: References: <4F900B4A.1020804@mebigfatguy.com> <095F37E0-3254-4C9F-8185-C4E58B3B19AC@thelastpickle.com> From: Jake Luciani Date: Fri, 20 Apr 2012 11:05:21 -0400 Message-ID: Subject: Re: 200TB in Cassandra ? To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b10c86d3c642504be1d9f3b X-Virus-Checked: Checked by ClamAV on apache.org --047d7b10c86d3c642504be1d9f3b Content-Type: text/plain; charset=ISO-8859-1 What other solutions are you considering? Any OLTP style access of 200TB of data will require substantial IO. Do you know how big your working dataset will be? -Jake On Fri, Apr 20, 2012 at 3:30 AM, Franc Carter wrote: > On Fri, Apr 20, 2012 at 6:27 AM, aaron morton wrote: > >> Couple of ideas: >> >> * take a look at compression in 1.X >> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression >> * is there repetition in the binary data ? Can you save space by >> implementing content addressable storage ? >> > > The data is already very highly space optimised. We've come to the > conclusion that Cassandra is probably not the right fit the use case this > time > > cheers > > >> >> Cheers >> >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 20/04/2012, at 12:55 AM, Dave Brosius wrote: >> >> I think your math is 'relatively' correct. It would seem to me you >> should focus on how you can reduce the amount of storage you are using per >> item, if at all possible, if that node count is prohibitive. >> >> On 04/19/2012 07:12 AM, Franc Carter wrote: >> >> >> Hi, >> >> One of the projects I am working on is going to need to store about >> 200TB of data - generally in manageable binary chunks. However, after doing >> some rough calculations based on rules of thumb I have seen for how much >> storage should be on each node I'm worried. >> >> 200TB with RF=3 is 600TB = 600,000GB >> Which is 1000 nodes at 600GB per node >> >> I'm hoping I've missed something as 1000 nodes is not viable for us. >> >> cheers >> >> -- >> *Franc Carter* | Systems architect | Sirca Ltd >> >> franc.carter@sirca.org.au | www.sirca.org.au >> Tel: +61 2 9236 9118 >> Level 9, 80 Clarence St, Sydney NSW 2000 >> PO Box H58, Australia Square, Sydney NSW 1215 >> >> >> >> > > > -- > > *Franc Carter* | Systems architect | Sirca Ltd > > > franc.carter@sirca.org.au | www.sirca.org.au > > Tel: +61 2 9236 9118 > > Level 9, 80 Clarence St, Sydney NSW 2000 > > PO Box H58, Australia Square, Sydney NSW 1215 > > -- http://twitter.com/tjake --047d7b10c86d3c642504be1d9f3b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable What other solutions are you considering? =A0Any OLTP style access of 200TB= of data will require substantial IO.

Do you know how bi= g your working dataset will be? =A0

-Jake

<= div class=3D"gmail_quote"> On Fri, Apr 20, 2012 at 3:30 AM, Franc Carter <franc.carter@sirca.org.au> wrote:
On Fri, Apr 20, 2012 at 6:27 AM, aaron morton <a= aron@thelastpickle.com> wrote:
Couple of ideas:

* is there repetition in the binary data ? Can you save space by imple= menting content addressable storage ?=A0

The data is already very highly space optimised. We've come to the= conclusion that Cassandra is probably not the right fit the use case this = time

cheers
=A0
=A0
Cheers


<= div style=3D"word-wrap:break-word">
-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 20/04/2012, at 12:55 AM, Dave Brosius wrote:

=20 =20 =20
I think your math is 'relatively' correct. It would seem to me = you should focus on how you can reduce the amount of storage you are using per item, if at all possible, if that node count is prohibitive.

On 04/19/2012 07:12 AM, Franc Carter wrote:

Hi,

One of the projects I am working on is going to need to store about 200TB of data - generally in manageable binary chunks. However, after doing some rough calculations based on rules of thumb I have seen for how much storage should be on each node I'm worried.

=A0 200TB with RF=3D3 is 600TB =3D 600,000GB
=A0 Which is 1000 nodes at 600GB per node

I'm hoping I've missed something as 1000 nodes is not vi= able for us.

cheers

--
Franc Carter | Systems arch= itect | Sirca Ltd
Level 9, 80 Clarence St, Sydney=A0NSW 2000<= /span>
PO Box H58, Australia Square, Sydney NSW 1215






--

Franc Carter<= /b> |<= /span> Systems architect | Sirca Ltd

franc.carter@sirca.org.au=A0|=A0www.sirca.org.au

Tel:= =A0+61 2 9236 9118

Level 9, 80 Clarence St, Sydney=A0NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215<= /span>





--
= http://twitter.com/t= jake
--047d7b10c86d3c642504be1d9f3b--