incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan King <r...@twitter.com>
Subject Re: Running multiple instances on a single server --micrandra ??
Date Thu, 09 Dec 2010 23:07:50 GMT
Overall, I don't think this is a crazy idea, though I think I'd prefer
cassandra to manage this setup.

The problem you will run into is that because the storage port is
assumed to be the same across the cluster you'll only be able to do
this if you can assign multiple IPs to each server (one for each
process) (I know this because I proposed something similar last year
:)).

-ryan

On Tue, Dec 7, 2010 at 10:00 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> The major downside is you're going to want to let each instance have
> its own dedicated commitlog spindle too, unless you just don't have
> many updates.
>
> On Tue, Dec 7, 2010 at 8:25 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
>> I am quite ready to be stoned for this thread but I have been thinking
>> about this for a while and I just wanted to bounce these ideas of some
>> guru's.
>>
>> Cassandra does allow multiple data directories, but as far as I can
>> tell no one runs in this configuration. This is something that is very
>> different between the hbase architecture and the Cassandra
>> architecture. HBase borrows the concept from hadoop of JBOD
>> configurations. HBase has many small ish (~256 MB) regions managed
>> with Zookeeper. Cassandra has a few (1 per node) large node sized
>> Token Ranges managed by Gossip consensus.
>>
>> Lets say a node has 6 300 GB disks. You have the options of RAID5,
>> RAID6, RAID10, or RAID0. The problem I have found with these
>> configurations are major compactions (of even large minor ones) can
>> take a long time. Even if your disk is not heavily utilized this is a
>> lot of data to move through. Thus node joins take a long time. Node
>> moves take a long time.
>>
>> The idea behind "micrandra" is for a 6 disk system run 6 instances of
>> Cassandra, one per disk. Use the RackAwareSnitch to make sure no
>> replicas live on the same node.
>>
>> The downsides
>> 1) we would have to manage 6x the instances of cassandra
>> 2) we would have some overhead for each JVM.
>>
>> The upsides ?
>> 1) Since disk/instance failure only degrades the overall performance
>> 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
>> down a disk)
>> 2) Moves and joins have less work to do
>> 3) Can scale up a single node by adding a single disk to an existing
>> system (assuming the ram and cpu is light)
>> 4) OPP would be "easier" to balance out hot spots (maybe not on this
>> one in not an OPP)
>>
>> What does everyone thing? Does it ever make sense to run this way?
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Mime
View raw message