incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ran Tavory <ran...@gmail.com>
Subject Re: understanding the cassandra storage scaling
Date Thu, 09 Dec 2010 10:50:11 GMT
>
> So is it not true that each node contains all the data in the cluster?

No, not in the general case, in fact rarely is it the case. Usually R<N. In
my case I have N=6 and R=2.
You configure R per CF under ReplicationFactor (v0.6.*)
or replication_factor (v0.7.*).
http://wiki.apache.org/cassandra/StorageConfiguration


On Thu, Dec 9, 2010 at 12:43 PM, Jonathan Colby <jonathan.colby@gmail.com>wrote:

> Thanks Ran.  This helps a little but unfortunately I'm still a bit
> fuzzy for me.  So is it not true that each node contains all the data
> in the cluster? I haven't come across any information on how clustered
> data is coordinated in cassandra.  how does my query get directed to
> the right node?
>
> On Thu, Dec 9, 2010 at 11:35 AM, Ran Tavory <rantav@gmail.com> wrote:
> > there are two numbers to look at, N the numbers of hosts in the ring
> > (cluster) and R the number of replicas for each data item. R is
> configurable
> > per column family.
> > Typically for large clusters N >> R. For very small clusters if makes
> sense
> > for R to be close to N in which case cassandra is useful so the database
> > doesn't have a single a single point of failure but not so much b/c of
> the
> > size of the data. But for large clusters it rarely makes sense to have
> N=R,
> > usually N >> R.
> >
> > On Thu, Dec 9, 2010 at 12:28 PM, Jonathan Colby <
> jonathan.colby@gmail.com>
> > wrote:
> >>
> >> I have a very basic question which I have been unable to find in
> >> online documentation on cassandra.
> >>
> >> It seems like every node in a cassandra cluster contains all the data
> >> ever stored in the cluster (i.e., all nodes are identical).  I don't
> >> understand how you can scale this on commodity servers with merely
> >> internal hard disks.   In other words, if I want to store 5 TB of
> >> data, does that each node need a hard disk capacity of 5 TB??
> >>
> >> With HBase, memcached and other nosql solutions it is more clear how
> >> data is spilt up in the cluster and replicated for fault tolerance.
> >> Again, please excuse the rather basic question.
> >
> >
> >
> > --
> > /Ran
> >
>



-- 
/Ran

Mime
View raw message