Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 59195 invoked from network); 20 Nov 2009 23:02:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Nov 2009 23:02:49 -0000 Received: (qmail 71655 invoked by uid 500); 20 Nov 2009 23:02:49 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 71551 invoked by uid 500); 20 Nov 2009 23:02:49 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 71530 invoked by uid 99); 20 Nov 2009 23:02:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Nov 2009 23:02:48 +0000 X-ASF-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of timunderwood@gmail.com designates 209.85.221.191 as permitted sender) Received: from [209.85.221.191] (HELO mail-qy0-f191.google.com) (209.85.221.191) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Nov 2009 23:02:46 +0000 Received: by qyk29 with SMTP id 29so1923768qyk.32 for ; Fri, 20 Nov 2009 15:02:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=+FeM/YDmoV+nT4fgC7mOPQkvrDLpQAbhxQzYjua0bl4=; b=ZpaY5ZiY2pUerbRgbxR6YH52qQ3GQMDsLDMhzZUdy2rZaPwUsLnbImHjqm+JsjVYN3 U8llMFrz+JVYeuVfHJLLGGlBrlUtJvcaCslvvw/13to0fp7d4+VRhB0Hk0U8wpJh1CTt 2m/6CT6vQnUMvMhxejrYBAbTvmkreH6tm2t4E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=RIDIRV/widblBDAYZ9VYVLFm5oRpnBnELlPOkvqRE9Ao1csVMitkqNqPfEpC3MkVse X5YFq2Fv/xnu4BJ9XJnGhdFgMsOLcu5HD5m2sur4KtOvmcVIQkibqAwi9BgPnUrbuJDP PjTsKX1eLf0nXIgHbNq+iZlcJl6A22EPQg2VI= MIME-Version: 1.0 Received: by 10.224.85.202 with SMTP id p10mr1110819qal.351.1258758145515; Fri, 20 Nov 2009 15:02:25 -0800 (PST) In-Reply-To: References: Date: Fri, 20 Nov 2009 15:02:25 -0800 Message-ID: <5ae3b19e0911201502o30c681a5gc1821c0dd35256c0@mail.gmail.com> Subject: Re: Cassandra users survey From: Tim Underwood To: cassandra-user@incubator.apache.org Cc: cassandra-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=00c09f8de2b31f23080478d5787d --00c09f8de2b31f23080478d5787d Content-Type: text/plain; charset=ISO-8859-1 My company runs a niche comparison shopping site where we take in all sorts of raw product data from various sources (retailers, manufacturers, distributors, etc...). We then have to take all that raw data and collapse it down across the data sources (e.g. product FOO from source A matches product BAR from source B) and eventually end up with a final product that gets surfaced to our website. Cassandra's data model works great for the raw data where columns are sparsely populated and updated. The SuperColumnFamily model works great for my collapsed data where I need to track which bits of information came from which raw data. I'm currently in testing (almost production). For this use case I'll only be using Cassandra on the backend and then indexing the final data into Apache Solr to power the frontend. My data is small enough to fit on a single node so I don't have much use for the partitioning at this point. If anything I'd be more interested in a fully replicated setup where the ReplicationFactor is equal to the number of nodes. I looked at most of the other nosql solutions (couchdb, mongodb, hbase, hypertable, dynomite, voldemort). One thing I'd love to see improved: - Reading through all the data (or a specific key prefix) in a ColumnFamily seems slow. Cassandra is the bottleneck when I try to index data into Solr and it looks like Cassandra's CPU usage is 2-3 times that of Solr's during the process. I look forward to playing around with 0.5! -Tim On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis wrote: > Hi all, > > I'd love to get a better feel for who is using Cassandra and what kind > of applications it is seeing. If you are using Cassandra, could you > share what you're using it for and what stage you are at with it > (evaluation / testing / production)? Also, what alternatives you > evaluated/are evaluating would be useful. Finally, feel free to throw > in "I'd love to use Cassandra if only it did X" wishes. :) > > I can start: Rackspace is using Cassandra for stats collection > (testing, almost production) and as a backend for the Mail & Apps > division (early testing). We evaluated HBase, Hypertable, dynomite, > and Voldemort as well. > > Thanks, > > -Jonathan > > (If you're in stealth mode or don't want to say anything in public, > feel free to reply to me privately and I will keep it off the record.) > --00c09f8de2b31f23080478d5787d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

My company runs a niche comparison shopping site where we take in all = sorts of raw product data from various sources (retailers, manufacturers, d= istributors, etc...). =A0We then have to take all that raw data and collaps= e it down across the data sources (e.g. product FOO from source A matches p= roduct BAR from source B) and eventually end up with a final product that g= ets surfaced to our website.

Cassandra's data model works great for the raw data= where columns are sparsely populated and updated. =A0The SuperColumnFamily= model works great for my collapsed data where I need to track which bits o= f information came from which raw data.

I'm currently in testing (almost production). =A0Fo= r this use case I'll only be using Cassandra on the backend and then in= dexing the final data into Apache Solr to power the frontend. =A0My data is= small enough to fit on a single node so I don't have much use for the = partitioning at this point. =A0If anything I'd be more interested in a = fully replicated setup where the ReplicationFactor is equal to the number o= f nodes.

I looked at most of the other nosql solutions (couchdb,= mongodb, hbase, hypertable, dynomite, voldemort).

One thing I'd love to see improved:

- Reading= through all the data (or a specific key prefix) in a ColumnFamily seems sl= ow. =A0Cassandra is the bottleneck when I try to index data into Solr and i= t looks like Cassandra's CPU usage is 2-3 times that of Solr's duri= ng the process.

I look forward to playing around with 0.5!

-Tim

On Fri, No= v 20, 2009 at 1:17 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

Hi all,

I'd love to get a better feel for who is using Cassandra and what kind<= br> of applications it is seeing. =A0If you are using Cassandra, could you
share what you're using it for and what stage you are at with it
(evaluation / testing / production)? Also, what alternatives you
evaluated/are evaluating would be useful. =A0Finally, feel free to throw in "I'd love to use Cassandra if only it did X" wishes. :)
I can start: Rackspace is using Cassandra for stats collection
(testing, almost production) and as a backend for the Mail & Apps
division (early testing). =A0We evaluated HBase, Hypertable, dynomite,
and Voldemort as well.

Thanks,

-Jonathan

(If you're in stealth mode or don't want to say anything in public,=
feel free to reply to me privately and I will keep it off the record.)

--00c09f8de2b31f23080478d5787d--