incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian O'Neill <>
Subject Re: Bitmap indexes - reviving CASSANDRA-1472
Date Thu, 11 Apr 2013 00:49:09 GMT

How does this compare with Druid?

We're currently evaluating Acunu, Vertica and Druid...

With its bitmapped indexes, Druid appears to have the most potential.  
They boast some pretty impressive stats, especially WRT handling "real-time" updates and adding
new dimensions.

They also use a compression algorithm, CONCISE, to cut down on the space requirements.

I haven't looked too deep into the Druid code, but I've been meaning to see if it could be
backed by C*.

We'd be game to join the hunt if you pursue such a beast. (with your code, or with portions
of Druid)


On Apr 10, 2013, at 5:40 PM, mrevilgnome wrote:

> What do you think about set manipulation via indexes in Cassandra? I'm
> interested in answering queries such as give me all users that performed
> event 1, 2, and 3, but not 4. If the answer is yes than I can make a case
> for spending my time on C*. The only downside for us would be our current
> prototype is in C++ so we would loose some performance and the ability to
> dedicate an entire machine to caching/performing queries.
> On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis <> wrote:
>> If you mean, "Can someone help me figure out how to get started updating
>> these old patches to trunk and cleaning out the Avro?" then yes, I've been
>> knee-deep in indexing code recently.
>> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome <>
>> wrote:
>>> I'm currently building a distributed cluster on top of cassandra to
>> perform
>>> fast set manipulation via bitmap indexes. This gives me the ability to
>>> perform unions, intersections, and set subtraction across sub-queries.
>>> Currently I'm storing index information for thousands of dimensions as
>>> cassandra rows, and my cluster keeps this information cached, distributed
>>> and replicated in order to answer queries.
>>> Every couple of days I think to myself this should really exist in C*.
>>> Given all the benifits would there be any interest in
>>> reviving CASSANDRA-1472?
>>> Some downsides are that this is very memory intensive, even for sparse
>>> bitmaps.
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder,
>> @spyced

Brian ONeill
Lead Architect, Health Market Science (

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message