incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <tsalora...@gmail.com>
Subject Re: Tree Search in Cassandra
Date Mon, 07 Jun 2010 01:11:20 GMT
Yeah, or maybe just "clustering", since there is no branching structure.
It's quite commonly useful even on regular b-tree style storage (BDB
et al), as it can reduce per-entry overhead quite a bit. And allows
very efficient compression, if entries have lots of redundancy (xml or
json serialized data).

I doubt this can be done reliably from client perspective. While a
good idea from functionality perspective, problem is that it requires
some level of atomic operations or locking, since updates are
multi-step operations. From server side I guess it would be similar to
work on allowing atomic multi-part operations (like ones being worked
on to implement counters?).

-+ Tatu +-

On Sun, Jun 6, 2010 at 2:19 AM, Ran Tavory <rantav@gmail.com> wrote:
> sounds interesting... btree on top of cassandra ;)
>
> On Sun, Jun 6, 2010 at 12:16 PM, David Boxenhorn <david@lookin2.com> wrote:
>>
>> I'm still thinking about the problem of how to handle range queries on
>> very large sets of data, using Random Partitioning.
>>
>> Has anyone used tree search to solve this? What do you think?
>>
>> More specifically, something like this:
>>
>> - Store a maximum of 1000 values per supercolumn (or some other fixed
>> number)
>> - Each supercolumn has a "greaterChild" and a "lessChild" in addition to
>> the values
>> - When the number of values in the supercolumn grows beyond the maximum,
>> split it into 3 parts, with the top third going into "greaterChild" and the
>> bottom third into "lessChild"
>> - To find a value, look at "greaterChild" and "lessChild" to find out
>> whether your key is within the current range, and if not, where to look next
>> - Range searches mean finding the first value, then looking at
>> "greaterChild" or "lessChild" (depending on the direction of your search)
>> until you reach the end of the range.
>>
>> Super Column Family:
>>
>> index [ <columnFamilyId> [ "firstVal" : <val> ,
>>                            "lastVal" : <val> ,
>>                            <val> : <dataId>,
>>                            "lessChild" : <columnFamilyId>
,
>>                            "greaterChild" : <columnFamilyId>
]
>>
>
>

Mime
View raw message