incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse McConnell <jesse.mcconn...@gmail.com>
Subject Re: bitmap slices
Date Mon, 01 Feb 2010 16:55:44 GMT
predicates for values would be nice, > < = and others would be quite useful

jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Mon, Feb 1, 2010 at 10:41, Jonathan Ellis <jbellis@gmail.com> wrote:
> I don't think this is very useful for column names.  I could see it
> being useful for values but if we're going to add predicate queries
> then I'd rather do something more general.
>
> 2010/2/1 Ted Zlatanov <tzz@lifelogs.com>:
>> On Mon, 1 Feb 2010 09:42:16 -0600 Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> JE> 2010/2/1 Ted Zlatanov <tzz@lifelogs.com>:
>>>> On Fri, 29 Jan 2010 15:07:01 -0600 Ted Zlatanov <tzz@lifelogs.com>
wrote:
>>>>
>> TZ> On Fri, 29 Jan 2010 12:06:28 -0600 Jonathan Ellis <jbellis@gmail.com>
wrote:
>> JE> On Fri, Jan 29, 2010 at 9:09 AM, Mehar Chaitanya
>> JE> <meharchaitanya@gmail.com> wrote:
>>>>>>>   1. This would lead to enourmous amount of duplication of data,
in short
>>>>>>>   if I now want to view the data from IS_PUBLISHED dimenstion
then my database
>>>>>>>   size would scale up tremendously.
>>>>
>> JE> Yes.  But disk space is so cheap it's worth using a lot of it to make
>> JE> other things fast.
>>>>
>> TZ> IIUC, Mehar would be duplicating the article data for every article tag.
>>>>
>> TZ> I searched the bug tracker and wiki and didn't find anything on the
>> TZ> topic of tag storage and search, so I don't think Cassandra supports
>> TZ> tags without data duplication.
>>>>
>> TZ> Would it be possible to implement an optional byte[] bitmap field in
>> TZ> SliceRange?  If you can specify the bitmap as an optional field it would
>> TZ> not break current clients.  Then the search can return only the subset
>> TZ> of the range that matches the bitmap.  This would make sense for
>> TZ> BytesType and LongType, at least.
>>>>
>>>> I looked at the source code and it seems that
>>>> StorageProxy::getSliceRange() is the focal point for reads and bitmap
>>>> matching should be implemented there.  The bitmap could be applied as a
>>>> filter before the other SliceRange parameters, especially the max number
>>>> of return results.  It may be worth the effort to send the bitmap down
>>>> to the ReadCommand/ColumnFamily level to reduce the number of potential
>>>> matches.
>>>>
>>>> If this is not feasible for technical reasons I'd like to know.
>>>> Otherwise I'll put it on my TODO list and produce a proposal (unless
>>>> someone more knowledgeable is interested, of course).
>>
>> JE> how would this be different then the byte[] column name you can
>> JE> already match on?
>>
>> Given byte columns
>>
>> A 0110
>> B 0111
>> C 0101
>>
>> the bitmask approach would let you specify a bitmask of "0011" and get
>> only B.  It's just an AND that looks for a non-zero value.  So you can
>> say "0111" and get A, B, and C.  Or "0010" to get A and B.  "1000" gets
>> nothing.
>>
>> Cassandra could support OR-ed multiples for better queries, so you could
>> ask for (0001,0010) to get A, B, and C.
>>
>> Ted
>>
>>
>

Mime
View raw message