incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Zlatanov <>
Subject Re: bitmap slices
Date Mon, 01 Feb 2010 15:11:11 GMT
On Fri, 29 Jan 2010 15:07:01 -0600 Ted Zlatanov <> wrote: 

TZ> On Fri, 29 Jan 2010 12:06:28 -0600 Jonathan Ellis <> wrote:

JE> On Fri, Jan 29, 2010 at 9:09 AM, Mehar Chaitanya
JE> <> wrote:
>>>   1. This would lead to enourmous amount of duplication of data, in short
>>>   if I now want to view the data from IS_PUBLISHED dimenstion then my database
>>>   size would scale up tremendously.

JE> Yes.  But disk space is so cheap it's worth using a lot of it to make
JE> other things fast.

TZ> IIUC, Mehar would be duplicating the article data for every article tag.

TZ> I searched the bug tracker and wiki and didn't find anything on the
TZ> topic of tag storage and search, so I don't think Cassandra supports
TZ> tags without data duplication.

TZ> Would it be possible to implement an optional byte[] bitmap field in
TZ> SliceRange?  If you can specify the bitmap as an optional field it would
TZ> not break current clients.  Then the search can return only the subset
TZ> of the range that matches the bitmap.  This would make sense for
TZ> BytesType and LongType, at least.

I looked at the source code and it seems that
StorageProxy::getSliceRange() is the focal point for reads and bitmap
matching should be implemented there.  The bitmap could be applied as a
filter before the other SliceRange parameters, especially the max number
of return results.  It may be worth the effort to send the bitmap down
to the ReadCommand/ColumnFamily level to reduce the number of potential

If this is not feasible for technical reasons I'd like to know.
Otherwise I'll put it on my TODO list and produce a proposal (unless
someone more knowledgeable is interested, of course).


View raw message