Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 51607 invoked from network); 29 Jul 2010 17:30:33 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 29 Jul 2010 17:30:33 -0000 Received: (qmail 44778 invoked by uid 500); 29 Jul 2010 17:30:31 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 44734 invoked by uid 500); 29 Jul 2010 17:30:31 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 44726 invoked by uid 99); 29 Jul 2010 17:30:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jul 2010 17:30:30 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of pickscrape@gmail.com designates 209.85.161.172 as permitted sender) Received: from [209.85.161.172] (HELO mail-gx0-f172.google.com) (209.85.161.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jul 2010 17:30:24 +0000 Received: by gxk1 with SMTP id 1so250677gxk.31 for ; Thu, 29 Jul 2010 10:30:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=evGZrA+9o0sl5hK+X2chrb9Xp8wS4/QGCWvfqQ5pbzM=; b=AxAQOniDye18nexcPgSzGVEG82R4GfhSFLlZ6svnO1uUPL3r9+9bCy1J5dtesofsFH 72xKT3FQSJQIr4Of0EIzKTesjopzaPBn0llxRGlyueMHFQZEde+j7cqHo9BN2WkvbJU/ QA4T+Lcch20gbSasi4aIzaFRrVulfJxptw0tM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=edinPGqzeuSbdp2ef+XXVzSHveU532rpndhlxR30T86gNaMo3xnWWFM0/mnc+T01Ix J6qjxIuI0YCvS+RjTNVzG0iWPEypfLmEem7L/SHoegdnFQmCcVv5ZVfkUOA0/gOP3tJw atTL90neU/gFwd0mSQKHkX4AgblSSiFpbiO/A= MIME-Version: 1.0 Received: by 10.90.91.9 with SMTP id o9mr780895agb.194.1280424602466; Thu, 29 Jul 2010 10:30:02 -0700 (PDT) Received: by 10.231.178.144 with HTTP; Thu, 29 Jul 2010 10:30:02 -0700 (PDT) In-Reply-To: <11a23f21-ecb4-6496-c300-9126777f8321@me.com> References: <11a23f21-ecb4-6496-c300-9126777f8321@me.com> Date: Thu, 29 Jul 2010 12:30:02 -0500 Message-ID: Subject: Re: Evaluating Cassandra for our use case From: Russ Brown To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Jul 28, 2010 at 9:13 PM, Aaron Morton wrote: > Have you considered Redis http://code.google.com/p/redis/? > > It may be more suited to the master-slave configuration you are after. > > - You can have a master to write to, then slave to a slave master, then your > web heads run a local redis and slave from the slave master. > - Backup at the master or the slave master > - Writes to the write master would make their way to the web head slave. > - Web heads only read from their local slave. > - Reads will be all in memory and faster than disk > - Redis can store a lot of data in memory and also use disk > (http://blogzawodny.com/2010/07/24/200000000-keys-in-redis-2-0-0-rc3/) > - Web heads would have to write to the master, not locally > > It sounds like your thinking of running a cassandra node on each web head > with full replication and only reading locally, I'm not sure if this is the > best use case. Would like to know what others think. I would imagine you > would get better over all up time and performance from running cassandra as > a cluster separate from the web heads, with less than full replication. > Thanks for this, Aaron. It does actually look like Redis may be better suited to our needs. I had originally discounted Redis because I had the impression that it had volatile storage only, but now I see that not to be the case. Thanks again! > Aaron > > > > > On 29 Jul, 2010,at 11:11 AM, Russ Brown wrote: > > Hi, > > I'm currently looking at NoSQL solutions to replace a bespoke system > that we currently have in place. Currently I think the best fit is > Cassandra, but I would like to get some feedback from those who know > it better before spending more time on it. > > Our current system is geared to allowing our web servers to operate > very quickly and completely independently (for most pages) of other > servers. This is accomplished by keeping chunks of data about "things" > on each machine's disk with a file per entity. The key in this is > effectively the filename, with the value being the file's content. A > central server handles the initial generation (and subsequent updates) > of these files, and distribution to the web servers is carried out by > a combination of network share mounting and shell scripts. > > The system *does* work: the servers are very fast and they do work > fine when the servers behind them disappear. However, the storage and > transport mechanisms are cumbersome, and we would like to see if there > are suitable alternatives available. > > The idea is to replace the disk-based storage on each server with a > NoSQL solution using replication to handle the transport automatically > for us. What we need is: > > * One "master", though being able to have a backup for it that we > could quickly bring into play would be advantageous > * Each "slave" must have a full copy of the data > * It does not matter if the slaves do not get updates immediately or > at exactly the same time, as long as they get there quickly > * Reads must be fast (though understandably it will probably be > slower than reading a system-cached file direct from disk) > * It would be a bonus if the slaves could be written to too, with the > writes making their way to the other nodes. This is probably a given, > but I thought I'd mention it anyway. > > Now, I have read a few things about Cassandra's read performance which > is what has got me a bit worried. However, I have also read quite a > bit about its flexibility in terms of topology, and that the read > performance is very much dependent on how things are set up. For > example, a lot of what I've read describes how when querying a node it > will ask other nodes for information, which it then collates and > returns. Is it possible to configure Cassandra in such a way that a > node only every asks itself for the data, and if so what sort of > effect will that have on read performance? Our current solution is > designed to avoid having to hit the network, so doing the same here > would be advantageous. > > I have also read that Cassandra will distribute data between different > nodes, while we want all to have a full copy of all data. Is it > possible to configure Cassandra in this way? > > If this will work, it will be a heck of a lot cleaner and easier to > maintain than the current solution, so we're quite hopeful. :) > > Feedback appreciated, > > -- > > Russ > -- Russ