Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 21074 invoked from network); 7 Jun 2010 06:39:40 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 7 Jun 2010 06:39:40 -0000 Received: (qmail 74520 invoked by uid 500); 7 Jun 2010 06:39:39 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 74384 invoked by uid 500); 7 Jun 2010 06:39:39 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 74374 invoked by uid 99); 7 Jun 2010 06:39:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Jun 2010 06:39:38 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.83.44] (HELO mail-gw0-f44.google.com) (74.125.83.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Jun 2010 06:39:30 +0000 Received: by gwj21 with SMTP id 21so402569gwj.31 for ; Sun, 06 Jun 2010 23:39:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.117.1 with SMTP id p1mr14540079ybc.274.1275892748299; Sun, 06 Jun 2010 23:39:08 -0700 (PDT) Received: by 10.151.51.12 with HTTP; Sun, 6 Jun 2010 23:39:08 -0700 (PDT) X-Originating-IP: [80.179.102.198] In-Reply-To: References: Date: Mon, 7 Jun 2010 09:39:08 +0300 Message-ID: Subject: Re: Tree Search in Cassandra From: David Boxenhorn To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cd70c4008acb104886aee60 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd70c4008acb104886aee60 Content-Type: text/plain; charset=ISO-8859-1 Is batch mutate atomic? If not, can we make it so? On Mon, Jun 7, 2010 at 4:11 AM, Tatu Saloranta wrote: > Yeah, or maybe just "clustering", since there is no branching structure. > It's quite commonly useful even on regular b-tree style storage (BDB > et al), as it can reduce per-entry overhead quite a bit. And allows > very efficient compression, if entries have lots of redundancy (xml or > json serialized data). > > I doubt this can be done reliably from client perspective. While a > good idea from functionality perspective, problem is that it requires > some level of atomic operations or locking, since updates are > multi-step operations. From server side I guess it would be similar to > work on allowing atomic multi-part operations (like ones being worked > on to implement counters?). > > -+ Tatu +- > > On Sun, Jun 6, 2010 at 2:19 AM, Ran Tavory wrote: > > sounds interesting... btree on top of cassandra ;) > > > > On Sun, Jun 6, 2010 at 12:16 PM, David Boxenhorn > wrote: > >> > >> I'm still thinking about the problem of how to handle range queries on > >> very large sets of data, using Random Partitioning. > >> > >> Has anyone used tree search to solve this? What do you think? > >> > >> More specifically, something like this: > >> > >> - Store a maximum of 1000 values per supercolumn (or some other fixed > >> number) > >> - Each supercolumn has a "greaterChild" and a "lessChild" in addition to > >> the values > >> - When the number of values in the supercolumn grows beyond the maximum, > >> split it into 3 parts, with the top third going into "greaterChild" and > the > >> bottom third into "lessChild" > >> - To find a value, look at "greaterChild" and "lessChild" to find out > >> whether your key is within the current range, and if not, where to look > next > >> - Range searches mean finding the first value, then looking at > >> "greaterChild" or "lessChild" (depending on the direction of your > search) > >> until you reach the end of the range. > >> > >> Super Column Family: > >> > >> index [ [ "firstVal" : , > >> "lastVal" : , > >> : , > >> "lessChild" : , > >> "greaterChild" : ] > >> > > > > > --000e0cd70c4008acb104886aee60 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Is batch mutate atomic? If not, can we make it so?
On Mon, Jun 7, 2010 at 4:11 AM, Tatu Saloranta = <tsaloranta@gm= ail.com> wrote:
Yeah, or maybe ju= st "clustering", since there is no branching structure.
It's quite commonly useful even on regular b-tree style storage (BDB et al), as it can reduce per-entry overhead quite a bit. And allows
very efficient compression, if entries have lots of redundancy (xml or
json serialized data).

I doubt this can be done reliably from client perspective. While a
good idea from functionality perspective, problem is that it requires
some level of atomic operations or locking, since updates are
multi-step operations. From server side I guess it would be similar to
work on allowing atomic multi-part operations (like ones being worked
on to implement counters?).

-+ Tatu +-

On Sun, Jun 6, 2010 at 2:19 AM, Ran Tavory <rantav@gmail.com> wrote:
> sounds interesting... btree on top of cassandra ;)
>
> On Sun, Jun 6, 2010 at 12:16 PM, David Boxenhorn <david@lookin2.com> wrote:
>>
>> I'm still thinking about the problem of how to handle range qu= eries on
>> very large sets of data, using Random Partitioning.
>>
>> Has anyone used tree search to solve this? What do you think?
>>
>> More specifically, something like this:
>>
>> - Store a maximum of 1000 values per supercolumn (or some other fi= xed
>> number)
>> - Each supercolumn has a "greaterChild" and a "less= Child" in addition to
>> the values
>> - When the number of values in the supercolumn grows beyond the ma= ximum,
>> split it into 3 parts, with the top third going into "greater= Child" and the
>> bottom third into "lessChild"
>> - To find a value, look at "greaterChild" and "less= Child" to find out
>> whether your key is within the current range, and if not, where to= look next
>> - Range searches mean finding the first value, then looking at
>> "greaterChild" or "lessChild" (depending on th= e direction of your search)
>> until you reach the end of the range.
>>
>> Super Column Family:
>>
>> index [ <columnFamilyId> [ "firstVal" : <val>= ; ,
>> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0 "lastVal" : <val> ,
>> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0 <val> : <dataId>,
>> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0 "lessChild" : <columnFamilyId> ,
>> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0 "greaterChild" : <columnFamilyId> ]
>>
>
>

--000e0cd70c4008acb104886aee60--