Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E388B9842 for ; Thu, 19 Apr 2012 20:27:59 +0000 (UTC) Received: (qmail 42589 invoked by uid 500); 19 Apr 2012 20:27:57 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 42567 invoked by uid 500); 19 Apr 2012 20:27:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 42558 invoked by uid 99); 19 Apr 2012 20:27:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Apr 2012 20:27:57 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a41.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Apr 2012 20:27:52 +0000 Received: from homiemail-a41.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a41.g.dreamhost.com (Postfix) with ESMTP id E8EB044C058 for ; Thu, 19 Apr 2012 13:27:28 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=kZLPdFtWgF V7j92VGuY1HMPKfbi37cOG0+a+vrrxbwORCLzGJ6g0C9ubvkPY8oG2eG3NpJqVIC WLX4Rkn7fVLLtbptLIeaJgDVFoWpfk2ub0uXkP6VcN6pEvaUuYtUmFqkUvWgD+dW b4AhQiXjnY0hGXHmrfZftea8IrBazOpl4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=onmZjKa/ytPoiVlB Vc+u2MS67YA=; b=aJKv4D476Z9iPUAzVyK2NeyQ8O/Mrn0omN72zViL2PLrIc7z SSKpw2626bZqdw1OBUiT1nCmpZoCfytsG/4yoIuPJhgkT75aYnTYiiC6sBhg4m8c RsQUdfT+tB/lDa0aRELbKJdECk4lauGea1RpBGO+It79oMszQmK8pDsVQ10= Received: from 202-126-206-197.vectorcommunications.net.nz (unknown [202.126.206.197]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a41.g.dreamhost.com (Postfix) with ESMTPSA id 80FFD44C01C for ; Thu, 19 Apr 2012 13:27:28 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: multipart/alternative; boundary="Apple-Mail=_8ED1AE2D-8B3A-4C97-816B-1870A1084AD9" Subject: Re: 200TB in Cassandra ? Date: Fri, 20 Apr 2012 08:27:25 +1200 In-Reply-To: <4F900B4A.1020804@mebigfatguy.com> To: user@cassandra.apache.org References: <4F900B4A.1020804@mebigfatguy.com> Message-Id: <095F37E0-3254-4C9F-8185-C4E58B3B19AC@thelastpickle.com> X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_8ED1AE2D-8B3A-4C97-816B-1870A1084AD9 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Couple of ideas: * take a look at compression in 1.X = http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression * is there repetition in the binary data ? Can you save space by = implementing content addressable storage ?=20 =20 Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/04/2012, at 12:55 AM, Dave Brosius wrote: > I think your math is 'relatively' correct. It would seem to me you = should focus on how you can reduce the amount of storage you are using = per item, if at all possible, if that node count is prohibitive. >=20 > On 04/19/2012 07:12 AM, Franc Carter wrote: >>=20 >>=20 >> Hi, >>=20 >> One of the projects I am working on is going to need to store about = 200TB of data - generally in manageable binary chunks. However, after = doing some rough calculations based on rules of thumb I have seen for = how much storage should be on each node I'm worried. >>=20 >> 200TB with RF=3D3 is 600TB =3D 600,000GB >> Which is 1000 nodes at 600GB per node >>=20 >> I'm hoping I've missed something as 1000 nodes is not viable for us. >>=20 >> cheers >>=20 >> --=20 >> Franc Carter | Systems architect | Sirca Ltd >> franc.carter@sirca.org.au | www.sirca.org.au >> Tel: +61 2 9236 9118=20 >> Level 9, 80 Clarence St, Sydney NSW 2000 >> PO Box H58, Australia Square, Sydney NSW 1215 >>=20 >=20 --Apple-Mail=_8ED1AE2D-8B3A-4C97-816B-1870A1084AD9 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compres= sion
* is there repetition in the binary data ? Can you = save space by implementing content addressable storage = ? 
 
Cheers


http://www.thelastpickle.com

On 20/04/2012, at 12:55 AM, Dave Brosius wrote:

=20 =20
I think your math is 'relatively' correct. It would seem to me you should focus on how you can reduce the amount of storage you are using per item, if at all possible, if that node count is prohibitive.

On 04/19/2012 07:12 AM, Franc Carter wrote:

Hi,

One of the projects I am working on is going to need to store about 200TB of data - generally in manageable binary chunks. However, after doing some rough calculations based on rules of thumb I have seen for how much storage should be on each node I'm worried.

  200TB with RF=3D3 is 600TB =3D 600,000GB
  Which is 1000 nodes at 600GB per node

I'm hoping I've missed something as 1000 nodes is not viable for us.

cheers

--
Franc Carter | Systems = architect | Sirca = Ltd
Level 9, 80 Clarence St, Sydney NSW 2000
PO Box H58, Australia Square, Sydney NSW = 1215



= --Apple-Mail=_8ED1AE2D-8B3A-4C97-816B-1870A1084AD9--