incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From banks <bankse...@gmail.com>
Subject Re: Integrity of batch_insert and also what about sharding?
Date Thu, 08 Apr 2010 01:02:57 GMT
Then from an IT standpoint, if i'm using a RF of 3, it stands to reason that
running on Raid 1 makes sense, since RAID and RF achieve the same ends... it
makes sense to strip for speed and let cassandra deal with redundancy, eh?


On Wed, Apr 7, 2010 at 4:07 PM, Benjamin Black <b@b3k.us> wrote:

> On Wed, Apr 7, 2010 at 3:41 PM, banks <banksenus@gmail.com> wrote:
> >
> > 2. each cassandra node essentially has the same datastore as all nodes,
> > correct?
>
> No.  The ReplicationFactor you set determines how many copies of a
> piece of data you want.  If your number of nodes is higher than your
> RF, as is common, you will not have the same data on all nodes.  The
> exact set of nodes to which data is replicated is determined by the
> row key, placement strategy, and node tokens.
>
> > So if I've got 3 terabytes of data and 3 cassandra nodes I'm
> > eating 9tb on the SAN?  are there provisions for essentially sharding
> across
> > nodes... so that each node only handles a given keyrange, if so where is
> the
> > howto on that?
> >
>
> Sharding is a concept from databases that don't have native
> replication and so need a term to describe what they bolt on for the
> functionality.  Distribution amongst nodes based on key ranges is how
> Cassandra always operates.
>
>
> b
>

Mime
View raw message