incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Bitmap indexes - reviving CASSANDRA-1472
Date Fri, 12 Apr 2013 14:52:07 GMT
Something like this?

WHERE user_id IN (select user_id from events where type in (1, 2, 3))
  AND user_id NOT IN (select user_id from events where type=4)

This doesn't really look like a Cassandra query to me.  More like a
query for Hive (or Drill, or Impala).

But, I know Sylvain is looking forward to adding index support to
Collections [1], so something like this might fit:

WHERE (events CONTAINS 1 OR events CONTAINS 2 OR events CONTAINS 3)
   AND NOT (events CONTAINS 4)

However, even this is more than our current query planner can handle;
we don't really handle disjunctions at all, except for the special
case of IN on the partition key (which translates to multiget), let
alone arbitrary logical predicates.

I think that between "bitmap indexes" and "query planning," the latter
is actually the hard part.  QueryProcessor is about at the limits of
tractable complexity already; I think we'd need a new approach if we
want to handle arbitrarily complex predicates like that.


On Wed, Apr 10, 2013 at 4:40 PM, mrevilgnome <> wrote:
> What do you think about set manipulation via indexes in Cassandra? I'm
> interested in answering queries such as give me all users that performed
> event 1, 2, and 3, but not 4. If the answer is yes than I can make a case
> for spending my time on C*. The only downside for us would be our current
> prototype is in C++ so we would loose some performance and the ability to
> dedicate an entire machine to caching/performing queries.
> On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis <> wrote:
>> If you mean, "Can someone help me figure out how to get started updating
>> these old patches to trunk and cleaning out the Avro?" then yes, I've been
>> knee-deep in indexing code recently.
>> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome <>
>> wrote:
>> > I'm currently building a distributed cluster on top of cassandra to
>> perform
>> > fast set manipulation via bitmap indexes. This gives me the ability to
>> > perform unions, intersections, and set subtraction across sub-queries.
>> > Currently I'm storing index information for thousands of dimensions as
>> > cassandra rows, and my cluster keeps this information cached, distributed
>> > and replicated in order to answer queries.
>> >
>> > Every couple of days I think to myself this should really exist in C*.
>> > Given all the benifits would there be any interest in
>> > reviving CASSANDRA-1472?
>> >
>> > Some downsides are that this is very memory intensive, even for sparse
>> > bitmaps.
>> >
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder,
>> @spyced

Jonathan Ellis
Project Chair, Apache Cassandra

View raw message