Have you considered Redis http://code.google.com/p/redis/?

It may be more suited to the master-slave configuration you are after.

- You can have a master to write to, then slave to a slave master, then your web heads run a local redis and slave from the slave master.
- Backup at the master or the slave master
- Writes to the write master would make their way to the web head slave.
- Web heads only read from their local slave.
- Reads will be all in memory and faster than disk
- Redis can store a lot of data in memory and also use disk (http://blogzawodny.com/2010/07/24/200000000-keys-in-redis-2-0-0-rc3/)
- Web heads would have to write to the master, not locally

It sounds like your thinking of running a cassandra node on each web head with full replication and only reading locally, I'm not sure if this is the best use case. Would like to know what others think. I would imagine you would get better over all up time and performance from running cassandra as a cluster separate from the web heads, with less than full replication.


On 29 Jul, 2010,at 11:11 AM, Russ Brown <pickscrape@gmail.com> wrote:


I'm currently looking at NoSQL solutions to replace a bespoke system
that we currently have in place. Currently I think the best fit is
Cassandra, but I would like to get some feedback from those who know
it better before spending more time on it.

Our current system is geared to allowing our web servers to operate
very quickly and completely independently (for most pages) of other
servers. This is accomplished by keeping chunks of data about "things"
on each machine's disk with a file per entity. The key in this is
effectively the filename, with the value being the file's content. A
central server handles the initial generation (and subsequent updates)
of these files, and distribution to the web servers is carried out by
a combination of network share mounting and shell scripts.

The system *does* work: the servers are very fast and they do work
fine when the servers behind them disappear. However, the storage and
transport mechanisms are cumbersome, and we would like to see if there
are suitable alternatives available.

The idea is to replace the disk-based storage on each server with a
NoSQL solution using replication to handle the transport automatically
for us. What we need is:

* One "master", though being able to have a backup for it that we
could quickly bring into play would be advantageous
* Each "slave" must have a full copy of the data
* It does not matter if the slaves do not get updates immediately or
at exactly the same time, as long as they get there quickly
* Reads must be fast (though understandably it will probably be
slower than reading a system-cached file direct from disk)
* It would be a bonus if the slaves could be written to too, with the
writes making their way to the other nodes. This is probably a given,
but I thought I'd mention it anyway.

Now, I have read a few things about Cassandra's read performance which
is what has got me a bit worried. However, I have also read quite a
bit about its flexibility in terms of topology, and that the read
performance is very much dependent on how things are set up. For
example, a lot of what I've read describes how when querying a node it
will ask other nodes for information, which it then collates and
returns. Is it possible to configure Cassandra in such a way that a
node only every asks itself for the data, and if so what sort of
effect will that have on read performance? Our current solution is
designed to avoid having to hit the network, so doing the same here
would be advantageous.

I have also read that Cassandra will distribute data between different
nodes, while we want all to have a full copy of all data. Is it
possible to configure Cassandra in this way?

If this will work, it will be a heck of a lot cleaner and easier to
maintain than the current solution, so we're quite hopeful. :)

Feedback appreciated,