Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 52639 invoked from network); 9 Dec 2010 10:28:59 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Dec 2010 10:28:59 -0000 Received: (qmail 77767 invoked by uid 500); 9 Dec 2010 10:28:57 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 77534 invoked by uid 500); 9 Dec 2010 10:28:56 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 77526 invoked by uid 99); 9 Dec 2010 10:28:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 10:28:56 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jonathan.colby@gmail.com designates 209.85.214.177 as permitted sender) Received: from [209.85.214.177] (HELO mail-iw0-f177.google.com) (209.85.214.177) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 10:28:49 +0000 Received: by iwn38 with SMTP id 38so3311669iwn.36 for ; Thu, 09 Dec 2010 02:28:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=MnLO8+sBDdtzNA+imQjitmG6GsdaE2KDD6nIC1Yoeyk=; b=fx2vZ1uBIeHdeZ/YRbL0epQen8jL7M/e+/2bTeKoZAuk7dbSQa2j+IzlyNWWgLvNQX UOPvUMKhH03/7N4ASPiQLKKqV2u+yrFyiIZ0htpfLaaCkk7eOgjYz+wdCcUeMJjNYKlh FToPph85503LrjR6YIXYXd9WXmhPvZ6+opISU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=mWsw2B8RH58A/DLt6EKrLme7V4LrTedlfYf1lpKf0fXbYHmIsUkHOr5fs1wXOgU1mt QFDpYhqm6os7KhJ7DhRKhgAvgG/KW6REXG3OctStbwJ6URgC0BhDa61wlUe9o9sAh6Jj 2fp/9393OXjyImcbv5/UE7tPWOirHSxi2RARs= MIME-Version: 1.0 Received: by 10.231.11.131 with SMTP id t3mr194925ibt.192.1291890509173; Thu, 09 Dec 2010 02:28:29 -0800 (PST) Received: by 10.231.13.141 with HTTP; Thu, 9 Dec 2010 02:28:29 -0800 (PST) Date: Thu, 9 Dec 2010 11:28:29 +0100 Message-ID: Subject: understanding the cassandra storage scaling From: Jonathan Colby To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 I have a very basic question which I have been unable to find in online documentation on cassandra. It seems like every node in a cassandra cluster contains all the data ever stored in the cluster (i.e., all nodes are identical). I don't understand how you can scale this on commodity servers with merely internal hard disks. In other words, if I want to store 5 TB of data, does that each node need a hard disk capacity of 5 TB?? With HBase, memcached and other nosql solutions it is more clear how data is spilt up in the cluster and replicated for fault tolerance. Again, please excuse the rather basic question.