incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Black...@b3k.us>
Subject Re: Integrity of batch_insert and also what about sharding?
Date Thu, 08 Apr 2010 01:40:16 GMT
That depends on your goals for fault tolerance and recovery time.  If
you use RAID1 (or other redundant configuration) you can tolerate disk
failure without Cassandra having to do repair.  For large data sets,
that can be a significant win.


b

On Wed, Apr 7, 2010 at 6:02 PM, banks <banksenus@gmail.com> wrote:
> Then from an IT standpoint, if i'm using a RF of 3, it stands to reason that
> running on Raid 1 makes sense, since RAID and RF achieve the same ends... it
> makes sense to strip for speed and let cassandra deal with redundancy, eh?
>
>
> On Wed, Apr 7, 2010 at 4:07 PM, Benjamin Black <b@b3k.us> wrote:
>>
>> On Wed, Apr 7, 2010 at 3:41 PM, banks <banksenus@gmail.com> wrote:
>> >
>> > 2. each cassandra node essentially has the same datastore as all nodes,
>> > correct?
>>
>> No.  The ReplicationFactor you set determines how many copies of a
>> piece of data you want.  If your number of nodes is higher than your
>> RF, as is common, you will not have the same data on all nodes.  The
>> exact set of nodes to which data is replicated is determined by the
>> row key, placement strategy, and node tokens.
>>
>> > So if I've got 3 terabytes of data and 3 cassandra nodes I'm
>> > eating 9tb on the SAN?  are there provisions for essentially sharding
>> > across
>> > nodes... so that each node only handles a given keyrange, if so where is
>> > the
>> > howto on that?
>> >
>>
>> Sharding is a concept from databases that don't have native
>> replication and so need a term to describe what they bolt on for the
>> functionality.  Distribution amongst nodes based on key ranges is how
>> Cassandra always operates.
>>
>>
>> b
>
>

Mime
View raw message