Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of pickscrape@gmail.com designates
 209.85.161.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=edinPGqzeuSbdp2ef+XXVzSHveU532rpndhlxR30T86gNaMo3xnWWFM0/mnc+T01Ix
         J6qjxIuI0YCvS+RjTNVzG0iWPEypfLmEem7L/SHoegdnFQmCcVv5ZVfkUOA0/gOP3tJw
         atTL90neU/gFwd0mSQKHkX4AgblSSiFpbiO/A=
MIME-Version: 1.0
In-Reply-To: <11a23f21-ecb4-6496-c300-9126777f8321@me.com>
References: <AANLkTi=j4-coc2xT8ximLBy3qukRwY9fe-gt+MbHM2vz@mail.gmail.com>
	<11a23f21-ecb4-6496-c300-9126777f8321@me.com>
Date: Thu, 29 Jul 2010 12:30:02 -0500
Message-ID: <AANLkTim61ihtUc-m9drz9wsV+bRmTHK=7hR4BaDFqrbb@mail.gmail.com>
Subject: Re: Evaluating Cassandra for our use case
From: Russ Brown <pickscrape@gmail.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Jul 28, 2010 at 9:13 PM, Aaron Morton <aaron@thelastpickle.com> wrote:
> Have you considered Redis http://code.google.com/p/redis/?
>
> It may be more suited to the master-slave configuration you are after.
>
> - You can have a master to write to, then slave to a slave master, then your
> web heads run a local redis and slave from the slave master.
> - Backup at the master or the slave master
> - Writes to the write master would make their way to the web head slave.
> - Web heads only read from their local slave.
> - Reads will be all in memory and faster than disk
> - Redis can store a lot of data in memory and also use disk
> (http://blogzawodny.com/2010/07/24/200000000-keys-in-redis-2-0-0-rc3/)
> - Web heads would have to write to the master, not locally
>
> It sounds like your thinking of running a cassandra node on each web head
> with full replication and only reading locally, I'm not sure if this is the
> best use case. Would like to know what others think. I would imagine you
> would get better over all up time and performance from running cassandra as
> a cluster separate from the web heads, with less than full replication.
>

Thanks for this, Aaron. It does actually look like Redis may be better
suited to our needs. I had originally discounted Redis because I had
the impression that it had volatile storage only, but now I see that
not to be the case.

Thanks again!

> Aaron
>
>
>
>
> On 29 Jul, 2010,at 11:11 AM, Russ Brown <pickscrape@gmail.com> wrote:
>
> Hi,
>
> I'm currently looking at NoSQL solutions to replace a bespoke system
> that we currently have in place. Currently I think the best fit is
> Cassandra, but I would like to get some feedback from those who know
> it better before spending more time on it.
>
> Our current system is geared to allowing our web servers to operate
> very quickly and completely independently (for most pages) of other
> servers. This is accomplished by keeping chunks of data about "things"
> on each machine's disk with a file per entity. The key in this is
> effectively the filename, with the value being the file's content. A
> central server handles the initial generation (and subsequent updates)
> of these files, and distribution to the web servers is carried out by
> a combination of network share mounting and shell scripts.
>
> The system *does* work: the servers are very fast and they do work
> fine when the servers behind them disappear. However, the storage and
> transport mechanisms are cumbersome, and we would like to see if there
> are suitable alternatives available.
>
> The idea is to replace the disk-based storage on each server with a
> NoSQL solution using replication to handle the transport automatically
> for us. What we need is:
>
> * One "master", though being able to have a backup for it that we
> could quickly bring into play would be advantageous
> * Each "slave" must have a full copy of the data
> * It does not matter if the slaves do not get updates immediately or
> at exactly the same time, as long as they get there quickly
> * Reads must be fast (though understandably it will probably be
> slower than reading a system-cached file direct from disk)
> * It would be a bonus if the slaves could be written to too, with the
> writes making their way to the other nodes. This is probably a given,
> but I thought I'd mention it anyway.
>
> Now, I have read a few things about Cassandra's read performance which
> is what has got me a bit worried. However, I have also read quite a
> bit about its flexibility in terms of topology, and that the read
> performance is very much dependent on how things are set up. For
> example, a lot of what I've read describes how when querying a node it
> will ask other nodes for information, which it then collates and
> returns. Is it possible to configure Cassandra in such a way that a
> node only every asks itself for the data, and if so what sort of
> effect will that have on read performance? Our current solution is
> designed to avoid having to hit the network, so doing the same here
> would be advantageous.
>
> I have also read that Cassandra will distribute data between different
> nodes, while we want all to have a full copy of all data. Is it
> possible to configure Cassandra in this way?
>
> If this will work, it will be a heck of a lot cleaner and easier to
> maintain than the current solution, so we're quite hopeful. :)
>
> Feedback appreciated,
>
> --
>
> Russ
>


-- 

Russ