Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 32254 invoked from network); 28 Jul 2010 23:51:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Jul 2010 23:51:03 -0000 Received: (qmail 45317 invoked by uid 500); 28 Jul 2010 23:43:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 44096 invoked by uid 500); 28 Jul 2010 23:43:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 44056 invoked by uid 99); 28 Jul 2010 23:43:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Jul 2010 23:43:39 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [207.5.72.226] (HELO EXHUB016-3.exch016.msoutlookonline.net) (207.5.72.226) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Jul 2010 23:43:30 +0000 Received: from EXVMBX016-3.exch016.msoutlookonline.net ([207.5.72.173]) by EXHUB016-3.exch016.msoutlookonline.net ([207.5.72.226]) with mapi; Wed, 28 Jul 2010 16:43:09 -0700 From: Daniel Kluesing To: "user@cassandra.apache.org" Date: Wed, 28 Jul 2010 16:43:03 -0700 Subject: RE: Evaluating Cassandra for our use case Thread-Topic: Evaluating Cassandra for our use case Thread-Index: AcsuqkZISSGFC9aSTsaZwCHnaeszsAAAyYzQ Message-ID: <33FDEB0CE2F65F41A4CF8769247BB3668DE13E14E5@EXVMBX016-3.exch016.msoutlookonline.net> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org >Is it possible to configure Cassandra in such a way that a >node only every asks itself for the data, and if so what sort of >effect will that have on read performance? Check out the RingCache class which lets you make your clients smart enough= to ask the right server. (Also, if all nodes have all the data like you me= ntion below, and you have your read consistency set to 1, you won't ask the= network nodes.) >I have also read that Cassandra will distribute data between different >nodes, while we want all to have a full copy of all data. Is it >possible to configure Cassandra in this way? If you set the replication factor to the number of nodes, then every node w= ill have a full copy. (That might get sticky if you add new servers, since = I don't think you can change the replication factor once set) -----Original Message----- From: Russ Brown [mailto:pickscrape@gmail.com]=20 Sent: Wednesday, July 28, 2010 4:11 PM To: user@cassandra.apache.org Subject: Evaluating Cassandra for our use case Hi, I'm currently looking at NoSQL solutions to replace a bespoke system that we currently have in place. Currently I think the best fit is Cassandra, but I would like to get some feedback from those who know it better before spending more time on it. Our current system is geared to allowing our web servers to operate very quickly and completely independently (for most pages) of other servers. This is accomplished by keeping chunks of data about "things" on each machine's disk with a file per entity. The key in this is effectively the filename, with the value being the file's content. A central server handles the initial generation (and subsequent updates) of these files, and distribution to the web servers is carried out by a combination of network share mounting and shell scripts. The system *does* work: the servers are very fast and they do work fine when the servers behind them disappear. However, the storage and transport mechanisms are cumbersome, and we would like to see if there are suitable alternatives available. The idea is to replace the disk-based storage on each server with a NoSQL solution using replication to handle the transport automatically for us. What we need is: * One "master", though being able to have a backup for it that we could quickly bring into play would be advantageous * Each "slave" must have a full copy of the data * It does not matter if the slaves do not get updates immediately or at exactly the same time, as long as they get there quickly * Reads must be fast (though understandably it will probably be slower than reading a system-cached file direct from disk) * It would be a bonus if the slaves could be written to too, with the writes making their way to the other nodes. This is probably a given, but I thought I'd mention it anyway. Now, I have read a few things about Cassandra's read performance which is what has got me a bit worried. However, I have also read quite a bit about its flexibility in terms of topology, and that the read performance is very much dependent on how things are set up. For example, a lot of what I've read describes how when querying a node it will ask other nodes for information, which it then collates and returns. Is it possible to configure Cassandra in such a way that a node only every asks itself for the data, and if so what sort of effect will that have on read performance? Our current solution is designed to avoid having to hit the network, so doing the same here would be advantageous. I have also read that Cassandra will distribute data between different nodes, while we want all to have a full copy of all data. Is it possible to configure Cassandra in this way? If this will work, it will be a heck of a lot cleaner and easier to maintain than the current solution, so we're quite hopeful. :) Feedback appreciated, --=20 Russ