cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Boxenhorn <da...@lookin2.com>
Subject Re: Tree Search in Cassandra
Date Mon, 07 Jun 2010 07:06:42 GMT
I wonder if there is a robust algorithm for maintaining b-trees that doesn't
require atomicity? How about if you create the three new super columns
first, then attach them to the parent, then delete the old super column? If
it fails, it would leave junk, but that could be cleaned up every once in a
while.

On Mon, Jun 7, 2010 at 9:43 AM, Ran Tavory <rantav@gmail.com> wrote:

> Quote from Gary:
>
> batch_mutate makes no atomicity guarantees.  It s intended to help avoiding
> many round-trips.
> It can fail half-way through leaving you with a partially completed batch.
>
> On Mon, Jun 7, 2010 at 9:39 AM, David Boxenhorn <david@lookin2.com> wrote:
>
>> Is batch mutate atomic? If not, can we make it so?
>>
>> On Mon, Jun 7, 2010 at 4:11 AM, Tatu Saloranta <tsaloranta@gmail.com>wrote:
>>
>>> Yeah, or maybe just "clustering", since there is no branching structure.
>>> It's quite commonly useful even on regular b-tree style storage (BDB
>>> et al), as it can reduce per-entry overhead quite a bit. And allows
>>> very efficient compression, if entries have lots of redundancy (xml or
>>> json serialized data).
>>>
>>> I doubt this can be done reliably from client perspective. While a
>>> good idea from functionality perspective, problem is that it requires
>>> some level of atomic operations or locking, since updates are
>>> multi-step operations. From server side I guess it would be similar to
>>> work on allowing atomic multi-part operations (like ones being worked
>>> on to implement counters?).
>>>
>>> -+ Tatu +-
>>>
>>> On Sun, Jun 6, 2010 at 2:19 AM, Ran Tavory <rantav@gmail.com> wrote:
>>> > sounds interesting... btree on top of cassandra ;)
>>> >
>>> > On Sun, Jun 6, 2010 at 12:16 PM, David Boxenhorn <david@lookin2.com>
>>> wrote:
>>> >>
>>> >> I'm still thinking about the problem of how to handle range queries
on
>>> >> very large sets of data, using Random Partitioning.
>>> >>
>>> >> Has anyone used tree search to solve this? What do you think?
>>> >>
>>> >> More specifically, something like this:
>>> >>
>>> >> - Store a maximum of 1000 values per supercolumn (or some other fixed
>>> >> number)
>>> >> - Each supercolumn has a "greaterChild" and a "lessChild" in addition
>>> to
>>> >> the values
>>> >> - When the number of values in the supercolumn grows beyond the
>>> maximum,
>>> >> split it into 3 parts, with the top third going into "greaterChild"
>>> and the
>>> >> bottom third into "lessChild"
>>> >> - To find a value, look at "greaterChild" and "lessChild" to find out
>>> >> whether your key is within the current range, and if not, where to
>>> look next
>>> >> - Range searches mean finding the first value, then looking at
>>> >> "greaterChild" or "lessChild" (depending on the direction of your
>>> search)
>>> >> until you reach the end of the range.
>>> >>
>>> >> Super Column Family:
>>> >>
>>> >> index [ <columnFamilyId> [ "firstVal" : <val> ,
>>> >>                            "lastVal" : <val> ,
>>> >>                            <val> : <dataId>,
>>> >>                            "lessChild" : <columnFamilyId> ,
>>> >>                            "greaterChild" : <columnFamilyId> ]
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Mime
View raw message