From Tim Underwood <>
Subject Re: Cassandra users survey
Date Fri, 20 Nov 2009 23:02:25 GMT
My company runs a niche comparison shopping site where we take in all sorts
of raw product data from various sources (retailers, manufacturers,
distributors, etc...).  We then have to take all that raw data and collapse
it down across the data sources (e.g. product FOO from source A matches
product BAR from source B) and eventually end up with a final product that
gets surfaced to our website.

Cassandra's data model works great for the raw data where columns are
sparsely populated and updated.  The SuperColumnFamily model works great for
my collapsed data where I need to track which bits of information came from
which raw data.

I'm currently in testing (almost production).  For this use case I'll only
be using Cassandra on the backend and then indexing the final data into
Apache Solr to power the frontend.  My data is small enough to fit on a
single node so I don't have much use for the partitioning at this point.  If
anything I'd be more interested in a fully replicated setup where the
ReplicationFactor is equal to the number of nodes.

I looked at most of the other nosql solutions (couchdb, mongodb, hbase,
hypertable, dynomite, voldemort).

One thing I'd love to see improved:

- Reading through all the data (or a specific key prefix) in a ColumnFamily
seems slow.  Cassandra is the bottleneck when I try to index data into Solr
and it looks like Cassandra's CPU usage is 2-3 times that of Solr's during
the process.

I look forward to playing around with 0.5!


On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <> wrote:

> Hi all,
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
> Thanks,
> -Jonathan
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)

