incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian O'Neill <b...@alumni.brown.edu>
Subject Re: Bitmap indexes - reviving CASSANDRA-1472
Date Thu, 11 Apr 2013 00:49:09 GMT

How does this compare with Druid?
https://github.com/metamx/druid

We're currently evaluating Acunu, Vertica and Druid...
http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra.html

With its bitmapped indexes, Druid appears to have the most potential.  
They boast some pretty impressive stats, especially WRT handling "real-time" updates and adding
new dimensions.

They also use a compression algorithm, CONCISE, to cut down on the space requirements.
http://ricerca.mat.uniroma3.it/users/colanton/concise.html

I haven't looked too deep into the Druid code, but I've been meaning to see if it could be
backed by C*.

We'd be game to join the hunt if you pursue such a beast. (with your code, or with portions
of Druid)

-brian


On Apr 10, 2013, at 5:40 PM, mrevilgnome wrote:

> What do you think about set manipulation via indexes in Cassandra? I'm
> interested in answering queries such as give me all users that performed
> event 1, 2, and 3, but not 4. If the answer is yes than I can make a case
> for spending my time on C*. The only downside for us would be our current
> prototype is in C++ so we would loose some performance and the ability to
> dedicate an entire machine to caching/performing queries.
> 
> 
> On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
> 
>> If you mean, "Can someone help me figure out how to get started updating
>> these old patches to trunk and cleaning out the Avro?" then yes, I've been
>> knee-deep in indexing code recently.
>> 
>> 
>> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome <mrevilgnome@gmail.com>
>> wrote:
>> 
>>> I'm currently building a distributed cluster on top of cassandra to
>> perform
>>> fast set manipulation via bitmap indexes. This gives me the ability to
>>> perform unions, intersections, and set subtraction across sub-queries.
>>> Currently I'm storing index information for thousands of dimensions as
>>> cassandra rows, and my cluster keeps this information cached, distributed
>>> and replicated in order to answer queries.
>>> 
>>> Every couple of days I think to myself this should really exist in C*.
>>> Given all the benifits would there be any interest in
>>> reviving CASSANDRA-1472?
>>> 
>>> Some downsides are that this is very memory intensive, even for sparse
>>> bitmaps.
>>> 
>> 
>> 
>> 
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder, http://www.datastax.com
>> @spyced
>> 

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message